PDB Newsletter -- Number 8 -- Winter 2001 -- RCSB
Published quarterly by the Research Collaboratory for Structural Bioinformatics
Links to this and previous PDB newsletters are available at http://www.rcsb.org/pdb/newsletter.html.
SNAPSHOT -- December 31, 2000 14,040 released atomic coordinate entries
Molecule Type Experimental Technique
12,524 proteins, peptides, and viruses   11,557 diffraction and other
911 nucleic acids 4,489 structure factor files
587 protein/nucleic acid complexes 2,176 NMR
18 carbohydrates 916 NMR restraint files
307 theoretical modeling

Message from the PDB PDB Outreach
Data Deposition and Processing --The PDB and Structural Genomics
--PDB Deposition Statistics --Seven Ways of Looking at a Protein
--New ADIT Features --CD-ROM Set 94 Released
--PDB Focus: ADIT Help Services --PDB Focus: Get Educated
Data Uniformity --PDB Focus: FAQ Lists
--mmCIF Files from the Data Uniformity Project and Translation Software Released   --Molecules of the Quarter: Ribosome, Rubisco, and Pepsin
--Data Uniformity Paper Published Statement of Support
Data Query, Reporting, Access, and Distribution
--New Query Features Available on the Beta Test Site
--PDB Web site Statistics
--PDB Focus: WWW User Guides
MESSAGE FROM THE PDB - Happy new millennium! Our resolutions for the new year are to build upon the accomplishments of the year 2000. We will continue to develop our software tools for deposition, processing, query, and reporting. Progress with the Data Uniformity Project, which is described in the 2001 Database Issue of Nucleic Acids Research, will proceed. And we hope to continue to hear from our users in the coming year to let us know what new features you would like to see in the PDB resource.
With this newsletter, we will be mailing out printed copies. If you would like to receive a copy, please send your mailing address to info@rcsb.org.
In this quarter, the PDB will have exhibit booths at the Biophysical Society's Annual Meeting (Boston, MA; February 17-21) and the Pittsburgh Conference (New Orleans, LA; March 4-9). Hope to see you there!
--The PDB

PDB DEPOSITION STATISTICS -The PDB saw 766 structure depositions during the last quarter of 2000.
Of these structures, approximately 65% are to be held until publication; 21% are to be released at the end of processing; and 14% are to be held until a certain date.
Structures are deposited using using ADIT at http://pdb.rutgers.edu/adit/ or http://pdbdep.protein.osaka-u.ac.jp/adit/, and by using AutoDep at the EBI at http://autodep.ebi.ac.uk/. The RCSB also maintains a copy of AutoDep at http://pdb.rutgers.edu/~adbnl/.

NEW ADIT FEATURES - The ADIT servers at http://pdb.rutgers.edu/adit/ and http://pdbdep.protein.osaka-u.ac.jp/adit/ were updated with additional data items, enhanced help and examples, and updated pull-down menu values during this quarter.
In the X-ray view, depositors can now include more detailed information in the refinement and diffraction categories. In both X-ray and NMR views, the help and example descriptions for all items have been expanded. New pull-down menus, such as for source organism name, were included to make depositing data faster and more uniform.
Questions and comments about ADIT can be sent to deposit@rcsb.rutgers.edu.

PDB FOCUS: ADIT HELP SERVICES - An e-mail help desk, a deposition FAQ, and several tutorials are available for depositors using the AutoDep Input Tool (ADIT, http://pdb.rutgers.edu/adit/).
General deposition and processing questions or comments can be submitted to deposit@rcsb.rutgers.edu, and are generally answered within one business day. Information is also provided at the PDB Data Deposition and Processing FAQ at http://pdb.rutgers.edu/, which supplies links to several data deposition and processing resources. Tutorials for ADIT and the ADIT Validation Server are also available from the ADIT page. Sample "in progress" depositions are available at http://pdb.rutgers.edu:81/.

mmCIF FILES FROM THE DATA UNIFORMITY PROJECT AND TRANSLATION SOFTWARE RELEASED - Approximately 1,000 mmCIF formatted files for nucleic acid-containing crystal structures are now available from the PDB beta ftp site at ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/. These entries were curated by the Nucleic Acid Database project and revisited for data uniformity processing.
The files follow the latest version of the mmCIF dictionary supplemented by an exchange dictionary developed by the PDB and the European Bioinformatics Institute. This exchange dictionary can be obtained from http://pdb.rutgers.edu/mmcif/.
An application program called CIFTr is available for translating files in mmCIF format into files in PDB format. CIFTr works on UNIX platforms, and can be downloaded at http://pdb.rutgers.edu/software/. CIFTr also provides the option of producing a file with a blank chain ID field for structures with a single chain, and the option of producing files with standard IUPAC hydrogen nomenclature for standard L-amino acids.

DATA UNIFORMITY PAPER PUBLISHED - As part of the Nucleic Acids Research's database issue, the PDB has a paper entitled "The PDB data uniformity project" that describes the data uniformity project that is underway to address the inconsistency in the archive.

T.N. Bhat, P.E. Bourne, Z. Feng, G. Gilliland, S. Jain, V. Ravichandran, B. Schneider, K. Schneider, N. Thanki, H. Weissig, J. Westbrook and H.M. Berman (2001): "The PDB data uniformity project." Nucleic Acids Research 29 (1), pp. 214-218. http://nar.oupjournals.org/cgi/content/full/29/1/214

Updates on the Data Uniformity Project are available at http://www.rcsb.org/pdb/uniformity.html.

NEW QUERY FEATURES AVAILABLE ON THE BETA TEST SITE - In the continued efforts to improve the capabilities of the PDB, the RCSB has released the following new features on the beta test site at http://beta.rcsb.org/pdb/:

Exact and Partial Word Match
The SearchFields and SearchLite interfaces now support both partial and exact word match queries on the text search fields. The exact word match feature is available as an option that the user can select. Partial word searching functionality remains available as the default, since this is also an important capability for certain queries.
Keyword searches through the SearchLite interface as well as the "Text Search" form field on the SearchFields form fully support matching on word boundaries. For example, searches for the keyword "man" with the"exact word match" option checked will not match entries containing words like "human" or "manitose".
For all other textual queries, "exact word matches" are currently only implemented as follows: Queries for "kinase" will not match values like "protein kinase" or "tyrosine kinase". This limitation will be removed in the near future to provide the same functionality as with keyword searches. In addition, keyword searches are now possible with arbitrary numbers of optionally nested parenthesis.

Title Record Results
Title records, when available, have been added to the search results on the Query Result Browser. The ability to sort results based on certain criteria has also been implemented, and users will be able to customize the output of these results in the future.

Title Record Search
The capability to search on structure titles has also been added. The title is based on information provided by the author which describes the
structure in a brief sentence.
We appreciate any comments you may have on these features. Please send your feedback to notify@rcsb.org.

PDB WEB SITE STATISTICS - The access statistics for the primary PDB Web site at http://www.rcsb.org/pdb/ reveal a large increase in the number of hits it has received during this past quarter. The daily average and monthly total number of hits, and the number of files downloaded, have increased significantly since this past summer.
While the www.rcsb.org address continues to receive the most traffic, use of the mirror sites and beta test site continues to increase. PDB users are encouraged to access their proximate RCSB mirror sites at Rutgers (http://rutgers.rcsb.org/), NIST (http://nist.rcsb.org/), the Cambridge Crystallographic Data Centre (http://pdb.ccdc.cam.ac.uk/), the National University of Singapore (http://pdb.bic.nus.edu.sg/), Osaka University (http://pdb.protein.osaka-u.ac.jp/), and the Universidade Federal de Minas Gerais (http://www.pdb.ufmg.br/). The beta test site can be accessed at http://beta.rcsb.org/pdb/.
Daily Average
Monthly Totals
December 2000    89,556 68,580 44,987 75,063,794 2,125,981 2,776,245
November 2000
October 2000
PDB FOCUS: WWW USER GUIDES - A list of useful guides to PDB's Web-based capabilities is available from the PDB home page at http://www.rcsb.org/pdb/info.html#PDB_Users_Guides.
Topics covered include tutorials on using ADIT and the Validation Server, instructions for downloading files and optimizing search results using the different PDB interfaces, how to interpret query results, trouble-shooting tips for molecular graphics applications, and documentation on the directory structure of the FTP site. Users can also obtain details about SearchFields features directly from that interface by clicking on their field titles. These guides are in place to provide PDB users with helpful information about using the PDB's many features.

THE PDB AND STRUCTURAL GENOMICS - As part of Nature Structural Biology's Structural Genomics Supplement issue (http://www.nature.com/nsb/structural_genomics/index.html), the PDB has a paper entitled "The Protein Data Bank and the challenge of structural genomics" (Berman et al. 2000). This document describes how the PDB's systems for the processing, exchange, query, and distribution of data will enable many aspects of high throughput structural genomics in the next few years.

H.M. Berman, T.N. Bhat, P.E. Bourne, Z. Feng, G. Gilliland, H. Weissig, J. Westbrook (2000): "The Protein Data Bank and the challenge of structural genomics." Nature Structural Biology 7 (Supp), pp. 957 - 959.

SEVEN WAYS OF LOOKING AT A PROTEIN - The PDB was used to create images for "Seven Ways of Looking at a Protein" by Clay Shirky in an article in the online magazine FEED at http://www.feedmag.com/feature/fr409_master.html.

CD-ROM SET 94 RELEASED - This quarter's CD-ROM set (Number 94) was released in October.
The CD-ROMs contain the full release of PDB structure files, structure factor files, NMR constraints and some contributed software and resources. The structure of the directories is that of earlier CD-ROM's; two directory names were changed with the April 2000 release. Distr was replaced by Entries and Nonst by Strucfac.
The current CD-ROM set includes the 13,270 structures as of the update on September 27, 2000. Five CD-ROMs are required to contain these structures in compressed (gzip) format. The next CD-ROM set will be released in January 2001.
The PDB releases these CD-ROMs on a quarterly basis at no cost. Ordering information is available at http://www.rcsb.org/pdb/cdrom.html.

PDB FOCUS: GET EDUCATED - One of the goals of the RCSB is to educate the community about the PDB portal and related topics of interest. A collection of valuable links are available from the PDB Get Educated page at http://www.rcsb.org/pdb/education.html. This resource contains a wealth of information for audiences ranging from elementary level students to undergraduates to the general public. Included are links to such features as general information about proteins and nucleic acids, several articles and animated presentations on the PDB, protein documentaries providing detailed descriptions of specific well-characterized proteins from different families, a query interface for the novice user, interactive 3-dimensional tutorials and college course materials, and an illustrated glossary of crystallographic and NMR terminology.
Suggestions for additions to this page are appreciated and can be sent to info@rcsb.org.

PDB FOCUS: FAQ LISTS - In order to assist PDB users, frequently asked questions with their corresponding answers have been compiled into two lists. One list of general questions and answers is available under "Contact Us" at http://www.rcsb.org/pdb/pdb_help.html#faqs. Another list of file format inquiries and responses is included with the list of "File Formats and Dictionaries" at http://www.rcsb.org/pdb/info.html#File_Formats_and_Dictionaries. Users are encouraged to utilize these helpful resources.
Please send any unanswered questions to info@rcsb.org.

MOLECULES OF THE QUARTER: RIBOSOME, RUBISCO, AND PEPSIN - The PDB has continued to feature its popular "Molecule of the Month" piece. Written and drawn by David S. Goodsell, an assistant professor of molecular biology at The Scripps Research Institute in La Jolla, California, these articles provide an overview of significant milestones in the growth of the PDB's macromolecular structure data for a diverse audience. Here is a sample of the information that is presented in this feature:

Ribosome: The Elusive Protein Factory -- October, 2000 -- Protein synthesis is the major task performed by living cells. For instance, roughly one third of the molecules in a typical bacterial cell are dedicated to this central task. Protein synthesis is a complex process involving many molecular machines. You can look at many of these molecules in the PDB, including DNA, DNA polymerases and RNA polymerases; a host of repressors, DNA repair enzymes, topoisomerases, and histones; tRNA and acyl-tRNA synthetases; and molecular chaperones. This month, for the first time, you can also look at the factory of protein synthesis in atomic detail.
The ribosome has been under the scrutiny of scientists for decades. Electron microscopy has yielded an increasingly detailed view over the years, defining the overall shape of individual ribosomes and differences in this shape for ribosomes from different species. More recently, detailed electron micrograph reconstructions have studied the interaction of ribosomes with messenger RNA, transfer RNA and the protein elongation factors. This legacy of morphological work lays the groundwork on which the atomic structures may be understood.
Ribosomes are composed of two subunits: a large subunit and a small subunit. Of course, the term "small" is used in a relative sense here: both the large and the small subunits are huge compared to a typical protein. Both subunits are composed of long strands of RNA dotted with protein chains. When synthesizing a new protein, the two subunits lock together with a messenger RNA trapped in the space between. The ribosome then walks down the messenger RNA three nucleotides at a time, building a new protein piece-by-piece.
The structure of the large subunit is available in PDB entries 1ffk and 1fjf. The large subunit contains the active site of the ribosome: the site that creates the new peptide bonds when proteins are synthesized. This structure and several others with inhibitors bound provide strong evidence that the ribosome is a ribozyme. Enzymes typically use amino acids to catalyze chemical reactions, but the ribosome appears to use an adenine RNA nucleotide to perform its synthetic task.
The large subunit is composed of two RNA strands. Dozens of proteins bind on the surface of the ribosome. Many have long, snaky tails that extend into the body of the ribosome, gluing the RNA strands into their proper shape. Several of the proteins were not seen in this crystallographic structure, perhaps because they are too flexible. Approximate shapes for these proteins form two prominent stalks which are commonly used as landmarks in electron micrographs.
The structure of the small subunit is available in the PDB entries 1fka and 1fjg. The small subunit is in charge of information flow during protein synthesis. It initially finds a messenger RNA strand and, after combining with a large subunit, ensures that each codon in the message is paired with the anticodon in the proper transfer RNA. The messenger RNA is thought to enter through a small hole and then extend up into the "decoding center" in the cleft between the "head" at one end and the "body" at the other. The messenger RNA does not have to thread through this hole like a needle, however, because the hole is actually formed by a loop of the ribosomal RNA, which can open like a latch to admit the messenger.
Before jumping into these structures, be prepared. Both the large subunit and the small subunit are enormous complexes with many atoms: the structure of the large subunit in PDB entry 1ffk contains over 64,000 atoms, even though the authors chose to release only alpha carbon positions for the proteins, and the small subunit structure in 1fka, also with partial structures for the proteins, contains almost 35,000 atoms. Many interactive display programs become very sluggish when working on structures this large.
The proposed active site in the large ribosomal subunit is comprised of several nucleic acid bases, potassium, and hydrogen. Adenine is thought to perform the synthesis reaction. Two guanines and a potassium ion serve to activate this adenine through a series of hydrogen bonds.

Rubisco: Fixing Carbon -- November, 2000 -- Carbon is essential to life. All of our molecular machines are built around a central scaffolding of organic carbon. Unfortunately, carbon in the earth and atmosphere is locked in highly oxidized forms, such as carbonate minerals and carbon dioxide gas. In order to be useful, this oxidized carbon must be "fixed" into more organic forms, rich in carbon-carbon bonds and decorated with hydrogen atoms. Powered by the energy of sunlight, plants perform this central task of carbon fixation.
Inside plant cells, the enzyme ribulose bisphosphate carboxylase/oxygenase (rubisco) forms the bridge between life and the lifeless, creating organic carbon from the inorganic carbon dioxide in the air. Rubisco takes carbon dioxide and attaches it to ribulose bisphosphate, a short sugar chain with five carbon atoms. Rubisco then clips the lengthened chain into two identical phosphoglycerate pieces, each with three carbon atoms. Phosphoglycerates are familiar molecules in the cell, and many pathways are available to use it. Most of the phosphoglycerate made by rubisco is recycled to build more ribulose bisphosphate, which is needed to feed the carbon-fixing cycle. But one out of every six molecules is skimmed off and used to make sucrose (table sugar) to feed the rest of the plant, or stored away in the form of starch for later use.
In spite of its central role, rubisco is remarkably inefficient. As enzymes go, it is painfully slow. Typical enzymes can process a thousand molecules per second, but rubisco fixes only about three carbon dioxide molecules per second. Plant cells compensate for this slow rate by building lots of the enzyme. Chloroplasts are filled with rubisco, which comprises half of the protein. This makes rubisco the most plentiful single enzyme on the Earth.
Rubisco also shows an embarrassing lack of specificity. Unfortunately, oxygen molecules and carbon dioxide molecules are similar in shape and chemical properties. In proteins that bind oxygen, like myoglobin, carbon dioxide is easily excluded because carbon dioxide is slightly larger. But in rubisco, an oxygen molecule can bind comfortably in the site designed to bind to carbon dioxide. Rubisco then attaches the oxygen to the sugar chain, forming a faulty oxygenated product. The plant cell must then perform a costly series of salvage reactions to correct the mistake.
Plants and algae build a large, complex form of rubisco, composed of eight copies of a large protein chain and eight copies of a smaller chain. The protein in the PDB entry 1rcx is taken from spinach leaves. The tobacco enzyme may be found in 1rlc. Many enzymes form similar symmetrical complexes. Often, the interactions between the different chains are used to regulate the activity of the enzyme in the process known as allostery. Rubisco, however, seems to be rigid as a rock, with each of the active sites acting independently of one another. In fact, photosynthetic bacteria build a smaller rubisco (shown in PDB entry 9rub) composed of only two chains, which performs its catalytic task just as well. So, why do plants build a large complex? The answer might lie in the crowded conditions under which rubisco performs its job. By packing many chains together into a tight complex, the protein reduces the surface that must be wetted by the surrounding water. This allows more protein chains, and thus more active sites, to be packed into the same space.
The active site of rubisco is arranged around a magnesium ion. The magnesium ion is held tightly by three amino acids, including a surprising modified form of lysine. An extra carbon dioxide molecule is attached firmly to the end of the snaky lysine sidechain. In plant cells, this "activator" carbon dioxide, which is different from the carbon dioxide molecules that are fixed in the reaction, is attached to rubisco during the day, turning the enzyme "on," and removed at night, turning the enzyme "off." The exposed side of the magnesium ion is then free to bind to both ribulose bisphosphate, holding onto two oxygen atoms, and the carbon dioxide molecule that will be attached to sugar. In the PDB entry 8ruc structure, the carbon dioxide is already attached to the sugar. You will find that this structure includes only one half of the entire rubisco complex--if you are interested in looking at the whole rubisco molecule, the structure in 1rcx contains all sixteen chains.

Pepsin: A Piece of Scientific History -- December, 2000 -- During the holiday season, we often place greater demands on our digestive enzymes than at other times of the year. Our digestive system contains a host of tough, stable enzymes designed to seek out those rich holiday treats and break them into small pieces. \Pepsin is the first in a series of enzymes that digest proteins. In the stomach, protein chains bind in the deep active site groove of pepsin, and are broken into smaller pieces. Then, a variety of proteases and peptidases in the intestine finish the job. The small fragments--amino acids and dipeptides--are then absorbed by cells for use as metabolic fuel or construction of new proteins.
Enzymes that digest proteins pose a real challenge. The enzyme must be constructed inside the cell, but controlled in some manner so that it doesn't immediately start digesting the cell's own proteins. To solve this problem, pepsin and many other protein-cutting enzymes are created as inactive "proenzymes," which may then be activated once safely outside the cell. Pepsin is constructed with an extra 44 amino acids which block the large active site groove and hobble the enzyme. In the stomach, this extraneous chain is clipped off and the enzyme begins its destructive campaign.
For several reasons, digestive enzymes are attractive candidates for scientific study. They are easily isolated and present in large amounts in digestive juices. They are also extraordinarily stable, because they perform their jobs under the harsh conditions present in the digestive system. The reactions catalyzed by digestive enzymes are also easily followed: you can add them to a protein such as gelatin and watch it lose its gel-like consistency. In the 18th century, pepsin was the first enzyme to be discovered, and later, pepsin was the second enzyme to be crystallized (after urease). These crystals played an important role in showing that enzymes were proteins and that they had a defined structure. Today, the structure of pepsin, determined from similar crystals, is available in PDB entry 5pep and several others.
Pepsin is one example of a group of enzymes termed "acid proteases." In the case of pepsin, this name is doubly appropriate. Pepsin works its best in strong hydrochloric acid. But the similarity with other enzymes refers to a second type of acid. The active site of the acid proteases rely on two acidic aspartate amino acids (asparteases), which activate a water molecule and use it to cleave protein chains.
The acid proteases have evolved to fill several functional roles in different organisms. Pepsin (PDB entry 5pep) is optimized for digestion of food in the acidic environment of the stomach. It is very promiscuous, cleaving proteins in many different places. Chymosin (PDB entry 4cms) is made by young calves to break down milk proteins. A purified form of chymosin, taken from calf stomach, has been used for centuries to curdle milk in the production of cheese. Cathepsin D (PDB entry 1lyb) digests proteins inside lysozomes, the tiny stomachs inside cells. Other cellular acid proteases such as renin, (PDB entry 1hrn), are designed to make very specific cuts in one particular protein, aiding in the maturation of a hormone or structural protein. Endothiapepsin (PDB entry 4ape) is made by a fungus and excreted into the surrounding environment, breaking up the surrounding proteins and allowing the fungus to feed on the pieces.
Pepsin uses a pair of aspartate residues to perform the protein cleavage reaction. In an example of parallel evolution (where two organisms independently develop the same method for solving a problem), the mechanism is similar to that used by HIV protease, discussed in a previous Molecule of the Month.
STATEMENT OF SUPPORT - The PDB is supported by funds from the National Science Foundation, the Office of Biology and Environmental Research at the Department of Energy, and two units of the National Institutes of Health: The National Institute of General Medical Sciences and the National Library of Medicine, in addition to resources and staff made available by the host institutions.
Contact information for all RCSB Members is available at http://www.rcsb.org/pdb/rcsb-group.html.
PDB job listings are available at http://www.rcsb.org/pdb/jobs.html.
PDB Newsletter -- Number 8 -- Winter 2001 -- RCSB
Published quarterly by the Research Collaboratory for Structural Bioinformatics
Weekly PDB news is available on the Web at http://www.rcsb.org/pdb/latest_news.html.
This newsletter is available on the Web in HTML format at http://www.rcsb.org/pdb/newsletter/.