PDB COMMUNITY FOCUS: JOHN L. MARKLEY, BMRB AND CENTER FOR EUKARYOTIC STRUCTURAL GENOMICS

John L. Markley received a Ph.D. in Biophysics from Harvard University in 1969 where he worked with Oleg Jardetzky and Elkan R. Blout. His graduate research included NMR studies of helix coil transitions in polyamino acids and the preparation and NMR investigation of selectively deuterated proteins. The latter project was made possible through funding and access to excellent research facilities at the Merck Research Laboratories in Rahway, New Jersey, where he spent 30 months. As a National Institutes of Health Postdoctoral Fellow with Melvin P. Klein at the University of California, Berkeley, he made the transition from continuous-wave to pulse Fourier transform NMR spectroscopy and investigated NMR relaxation mechanisms. He joined the faculty of the Chemistry Department at Purdue University as an Assistant Professor in 1972, and by 1981 had moved up the ranks to Professor. He relocated to the Biochemistry Department at the University of Wisconsin-Madison in 1983, where he founded the National Magnetic Resonance Facility at Madison (1985), the BioMagResBank (BMRB; 1990), and the Center for Eukaryotic Structural Genomics (2000). He currently is Steenbock Professor of Biomolecular Structure and chairs the Graduate Program in Biophysics at the University of Wisconsin-Madison.

Q. What is the history behind the BMRB?

A. NMR is unique among biophysical approaches in its ability to provide a broad range of atomic-level information relevant to the structural, dynamic, and chemical properties of biological macromolecules. Since my days as a graduate student, I have been deeply impressed by the value of chemical information available from NMR data assigned to specific sites in proteins. In 1984, Eldon Ulrich and I wrote a comprehensive review of all assigned chemical shifts in proteins, which required digging information from individual publications in the literature. In 1985, during a mini-sabbatical with the late Professor Yoshimasa Kyogoku at the Protein Research Institute in Osaka, Japan, I formulated the idea for organizing a publicly available data bank for assigned protein NMR parameters. In the meantime, the first NMR structures of proteins began appearing. Ulrich, Kyogoku, and I refined the idea for a data repository for NMR information about protein structure and dynamics and published a proposal in 1989. We secured funding for a pilot study from the National Library of Medicine, which enabled us to begin developing data models. These initially took the form of flat files with a rigid format. Later, BMRB adopted a relational database format and following discussions with Helen Berman and other developers of mmCIF (which is a variant of the STAR format devised by Sydney Hall and Nick Spadaccini) eventually developed the current NMR-STAR format. The data exchange format created at BMRB has gained rapid and widespread acceptance in the community. NMR-STAR is extensible and thus can accommodate the addition of new NMR parameters of interest to the biomolecular NMR community. Its tag-value nature and the tabular organization of the data model make it easy to interconvert NMR-STAR with relational or XML formats. BMRB's holdings have grown to include chemical shifts, J-couplings, relaxation rates, residual dipolar couplings, and chemical information derived from NMR investigations (such as hydrogen exchange rates, pKa values, and structural restraints). New types of data collected are raw (time domain) data for structure determinations (largely from structural genomics centers) and solid-state NMR data. Our early idea was that BMRB annotators would gather and enter data from the literature, but the enormous growth in the biomolecular NMR field and the reluctance of journals to publish this information soon made this impractical. As with the PDB, BMRB relies on depositions from scientists. We are grateful that the NLM has funded BMRB continuously since its founding. BMRB now has mirror sites in Florence, Italy, and Osaka, Japan. The Osaka site is beginning to take responsibility for data depositions from that part of the world.

Q. What are the interactions with the BMRB and the RCSB PDB?

A. BMRB is a member of the RCSB and has close ties with the PDB. Our interactions are warm and collegial and span common interests in standards for data representation, software development, and data interchange. Recent collaboration with PDB has centered on unifying the nomenclature used for X-ray and NMR structures and the underlying data. BMRB and PDB have been pursuing the common goal of providing the tools for harvesting the full range of information in digital form that constitutes the normal 'Methods' section of an article in a journal such as the J. Biol. Chem. BMRB has adapted the PDB ADIT deposition software for data entry as a way of making data entry more uniform across the two data banks. Currently, data relevant to BMRB associated with NMR structures deposited at PDB are transferred automatically to BMRB for processing. BMRB and PDB are close to releasing jointly developed software that will provide one-stop data entry for NMR structures. This will simplify the task of depositors as well as streamline the work of annotators at BMRB and PDB.

Q. Does your work in structural genomics influence your work with the BMRB, and vice versa?

A. The goals of structural genomics are to enlarge knowledge of sequence-structure-function interrelationships and to lower the costs of solving structures. At the same time, its technology and products are to be made available to the community. Structural genomics both reinforces the kind of work that has gone on at the BMRB and the PDB, and offers challenges to these data banks. In some ways, the data dictionary work at BMRB and PDB anticipated many of the demands of structural genomics. Both data banks were well prepared to handle increasing numbers of structures and the increasing level of detail about the experiments demanded by structural genomics. The challenges have been to provide streamlined data deposition and pre-validation tools. BMRB has worked closely with all structural genomics centers that utilize NMR spectroscopy to determine their needs and to seek suggestions for improving the operations of the data bank. These interactions have led to new developments at BMRB, such as the use of validation software developed at the Northeast Structural Genomics Consortium, the extension of NMR-STAR format for chemical shifts to include probabilistic assignments as developed at the National Magnetic Resonance Facility at Madison, and the repository of collections of time-domain data sets used in structure determinations. Also, in response to the structural genomics community, BMRB has been increasing its links to other sites on the Web.

Q. The amount of experimental data to be archived is growing exponentially. What do you see as the future for managing this data?

A. We are confident of our ability at BMRB, given the current level of approved funding, to keep up with the growth in the field. Our strategy at BMRB is to develop procedures and software that enable us to manage data more efficiently. The new data deposition system mentioned above, which jointly handles coordinates as well as underlying data and information about the biological system and experiments conducted, allows for more automated data harvesting and validation. I anticipate that BMRB, like PDB, will become increasingly internationalized. The new data deposition site at Osaka represents a positive step in this direction. At first, data collected at Osaka will be transferred to Madison for annotation, but the plan is to develop this capability in Osaka.

Q. The size and complexity of structures being solved by X-ray and EM methods continues to increase over time. Is there a limit to the size of structures that can be investigated with NMR methods?

A. As recent publications from the laboratories of Kurt Wuthrich and others have shown, it is difficult to place an absolute molecular weight limit on NMR structures. With conventional methods of uniform 13C +15N labeling, the practical limit for high-throughput NMR structure determinations is about 20 kDa. However, by expending additional effort and by utilizing 2H labeling, the bar can be raised to 30 kDa and above. Recent results from Masatsune Kainosho's laboratory with stereo array isotope labeling suggest that this approach could increase the practical limit to 40 kDa. Selective labeling approaches also are increasing the sizes of RNA structures that can be solved by NMR. It is important to recognize that NMR can provide useful information about structure-function relationships even in the absence of a three-dimensional structure. BMRB captures this kind of information, which can be obtained from systems 150 kDa and larger.