PDB Newsletter -- Number 7 -- Fall 2000 -- RCSB
Published quarterly by the Research Collaboratory for Structural Bioinformatics
Links to this and previous PDB newsletters are available at http://www.rcsb.org/pdb/newsletter.html.
SNAPSHOT -- October 1, 2000 13,270 released atomic coordinate entries
Molecule Type Experimental Technique
11,806 proteins, peptides, and viruses 10,900 diffraction and other
567 nucleic acids 4,129 structure factor files
879 protein/nucleic acid complexes 2,071 NMR
18 carbohydrates 847 NMR restraint files
299 theoretical modeling
Message from the PDB Data Query Reporting, Access, and Distribution
Data Deposition and Processing --New Query and Reporting Features
--PDB Deposition Statistics --PDB Web site Statistics
--PDB Focus: The ADIT Validation Server PDB Outreach
--Data Deposited and Processed Using the ADIT Mirror at Osaka University --PDB Activities at Summer Conferences
Data Uniformity and NMR --PDB CD-ROM Set 93 Released
--Data Uniformity and Enzyme Classification --Molecules of the Quarter: Nucleosome, Restriction Enzymes, and Lysozyme
--PDB's NMR Task Force at ICMRBS Conference Statement of Support
MESSAGE FROM THE PDB - Two years ago, the Research Collaboratory for Structural Bioinformatics (RCSB) was awarded the grant to manage the Protein Data Bank (PDB). We are pleased to report that the project continues to grow, as detailed in this newsletter. From added query features to the establishment of an ADIT deposition and processing site at Osaka University in Japan, the PDB has been dedicated this quarter to expanding the capabilities of this resource.
A detailed look at the first full year of the RCSB's operation of the PDB (July 1, 1999 - June 30, 2000) is available in the PDB Annual Report. This document, which will be available after November 1, 2000, highlights PDB functions, accomplishments during this period, and plans for the coming year. If you would like a copy of this report, please send mail to AnnualReport@rcsb.org and include your postal address. As always, your questions and comments are welcome at info@rcsb.org.
--The PDB

PDB DEPOSITION STATISTICS - In the third quarter of 2000, 676 structures were deposited to the PDB using ADIT -- 215 in July, 231 in August, and 230 in September. Approximately 16% of these entries were deposited with a HOLD release status; 63% with a hold until publication status; and 21% with an immediate release status.

PDB FOCUS: THE ADIT VALIDATION SERVER - Depositors are encouraged to use the ADIT Validation Server (http://pdb.rutgers.edu/validate/) prior to the deposition of a structure to the PDB. The Validation Server may be used to check a structure at any time during structure determination and refinement. There is no limit to the number of times it can be used.
The RCSB developed the Validation Server to allow users to check the format consistency of coordinates (Precheck) and to create validation reports about a structure before deposition (Validation).
To use the Validation Server, the coordinate file should be in either PDB or mmCIF format. The structure factor file must be in mmCIF format (see http://pdb.rutgers.edu/sf_cif.html for more information).
The Precheck step will produce a brief report identifying any changes that need to be made in your data files in order to obtain a validation report.
The Validation step produces a validation report which includes an Atlas entry, a summary report, and a collection of structural diagnostics including bond distance and angle comparisons, torsion angle comparisons, base morphology comparisons (for nucleic acids), and molecular graphic images. Reports from PROCHECK (Laskowski et al. 1993), NUCheck (Feng et al. 1998), and SFCheck (Vaguine et al. 1999) are made available.
Tutorials are available online at http://pdb.rutgers.edu/validate/docs/tutorial.html.

Laskowski, RA, McArthur, MW, Moss, DS, and Thornton, JM. J. Appl. Cryst., 1993; 265:283-291.
Feng, Z, Westbrook, J, and Berman, HM. NUCheck. 1998. NDB-407 Rutgers University, New Brunswick, NJ.
Vaguine, AA, Richelle, J, and Wodak, SJ. Acta Crystallogr., 1999; D55:191-205.

DATA DEPOSITED AND PROCESSED USING THE ADIT MIRROR AT OSAKA UNIVERSITY - During the past three months, depositions have been accepted at the ADIT site established at the Institute for Protein Research at Osaka University in Osaka, Japan. These entries have been processed by staff at the Laboratory of Protein Informatics (Head, Professor Haruki Nakamura) at the Institute for Protein Research at Osaka University using the same ADIT tools and procedures as the RCSB. These entries are automatically mirrored to the RCSB's processing database and are released into the PDB archive. Under Dr. Masami Kusunoki's direction, Dr. Genji Kurisu, Reiko Igarashi, and Takashi Kosada have processed over forty structures deposited at the new ADIT site.
ADIT is available at http://pdb.rutgers.edu/adit and http://pdbdep.protein.osaka-u.ac.jp/adit/.

DATA UNIFORMITY AND ENZYME CLASSIFICATION - In the PDB's on-going effort to improve the reliability of searches, the data uniformity project looked at the accurate query of enzymes according to the Enzyme Commission classification, including the EC number and name. The enzyme classification was derived directly from the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). The PDB would like to acknowledge the work of Dietmar Schomburg of the Institute of Biochemistry at the University of Cologne, and Project leader of the BRENDA database for enzyme classification remediation.
EC numbers were extracted from the PDB files either from the PDB Compound (COMPND) records or by a text search of the PDB file. A relational table associating EC number with compound name was constructed for all PDB enzyme entries. This table was then used along with the nomenclature used by the Enzyme Commission to search by compound name and identify the missing EC numbers. A manual verification of all the data was used to make the final EC assignments in the database. A description of query and reporting on EC numbers is available below.

PDB'S NMR TASK FORCE AT ICMRBS CONFERENCE - The PDB's NMR Task Force brings together experts to provide the PDB with advice on issues that are specific to NMR structure files. This group met during the XIX International Conference on Magnetic Resonance in Biological Structures (ICMRBS), August 20-25 in Florence, Italy. The Task Force discussed IUPAC nomenclature, representative structures and constraints files.
The next meeting of this group will be the Keystone Meeting - Frontiers of NMR in Molecular Biology VII, to be held in Big Sky, Montana, on January 20-26, 2001.

NEW QUERY AND REPORTING FEATURES - The RCSB continues to enhance and upgrade the capabilities of the PDB customizable search interface and reporting output. Several new query and reporting features, previously available on the PDB beta test site, have been moved to the main PDB Web site and its mirrors. These features include:

Number of Chains and Source Organism
It is now possible to use the number of chains as derived from the PDB SEQRES as a search criteria via the SearchFields interface. To accomplish this, select the custom option "Number of Chains and Chain Length" and generate a new form to make this field available.
In order to query by source organism, select the custom option "Source" from the SearchFields interface and generate a new form. Specifications of the scientific name for source were parsed from the National Center for Biotechnology Information (NCBI)'s Macromolecular Database (MMDB) assignments. Users may also substitute a common name for a scientific name, such as a substitution of 'human' for 'Homo sapiens'. The curated data for source is currently available on the beta test site (http://beta.rcsb.org) and will soon become available on the main PDB site and its mirrors.

Enzyme Classification
As described above, a major development has been the accurate query of enzymes according to the Enzyme Commission classification. The combination of a complete enzyme classification of PDB structures, including the enzyme nomenclature, enables users to identify all structures available for a particular enzyme class at all four enzyme classification levels.
A convenient way of accessing all structures belonging to a particular enzyme class is provided with the EC Classification Browser linked to the SearchFields interface. To locate this information, simply choose the custom option "EC Number and Classification" and generate a new form.
The EC values are also available in the Structure Explorer pages.

Sequence Entries Cross-Link
Another addition to the PDB site concerns the Sequence Details available for every structure. The Sequence Details section of the Structure Explorer now points to the sequence entries in the major sequence databases corresponding to the particular structure being analyzed. This cross-link is currently limited to structures deposited after January 27, 1999.
Furthermore, the Structure Explorer interface has been enhanced with navigation arrows, for result sets larger than a single structure, that allow for browsing of individual structures within the set.

PDB WEB SITE STATISTICS - An analysis of access statistics for the primary PDB Web site shows that the number of hits it has received has fluctuated a bit over the past few months, with monthly totals ranging between 2,460,455 and 3,545,612 hits. The total number of files downloaded per month ranges between 1,876,663 and 2,780,938 files. A new record of 3,545,612 Web site hits was achieved during the month of September 2000, which is 427,880 more hits than the previous record month of March 2000. The number of files downloaded in September also reached a new record high of 2,780,938 files, which is 401,228 more files than were downloaded in March 2000.
We are pleased to see the tremendous amount of user interest during September, with a daily average of 122,262 hits and 95,894 files downloaded. School schedules probably have a significant impact on the large increase in the access statistics for that month, as our largest user group continues to be the U.S. educational institutions.
While the www.rcsb.org address continues to receive the most traffic, use of the mirror sites and beta test site continues to increase.
Daily Average
Monthly Totals
September 2000
August 2000
July 2000

PDB ACTIVITIES AT SUMMER CONFERENCES - An important aspect of the PDB outreach and education efforts is to maintain discourse with the PDB user community. One effective means to achieve this is to represent the project at a variety of meetings and conferences which are of interest to our colleagues. This presents opportunities for users to meet with PDB members to interact and gain insights about the PDB's features and progress. Here are the highlights from this past summer's events:

American Crystallographic Association Meeting
The PDB hosted an exhibit booth at the American Crystallographic Association's Annual Meeting in St. Paul, MN on July 22- 27. PDB members were on hand to answer questions from the meeting attendees. A PDB Users Meeting held on July 24th was also part of the RCSB activities at this conference. A wide range of interesting inquiries and comments were presented by the many people who attended. We were pleased to have such a good turnout and were glad to have this opportunity to interact directly with the scientific community.

Protein Society Symposium
The PDB was represented at an exhibit booth at the 14th Annual Symposium of the Protein Society at the San Diego Convention Center on August 5-9, 2000. Symposium attendees were able to have their questions answered by the PDB team at the booth. We were pleased to meet so many PDB users at this event and thank those of you who came by to show your support. The proactive feedback from the user community is greatly appreciated!

Intelligent Systems for Molecular Biology Conference
The PDB hosted an exhibit booth at the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB) held at the Price Center on the campus of the University of California, San Diego in La Jolla, CA, on August 19-23, 2000. PDB members were available to respond to user questions and present informative materials such as tutorials to the conference attendees. A lot of constructive feedback was received from the user community, and we value the great suggestions that were given.
We hope to meet you all again at future conferences!

PDB CD-ROM SET 93 RELEASED - The PDB archive was released on this quarter's CD-ROM set 93.
The CD-ROMs contain the full release of PDB structure files, structure factor files, NMR constraints and some contributed software and resources. The structure of the directories is that of earlier CD-ROM's; two directory names were changed with the April 2000 release. Distr was replaced by Entries and Nonst by Strucfac.
The current CD-ROM set includes the 12,592 structures as of the update on June 28, 2000. Five CD-ROMs are required to contain these structures in compressed (gzip) format. The next CD-ROM set, the October 2000 release, will include the 13,270 structures as of the update on September 27, 2000.
The PDB releases these CD-ROMs on a quarterly basis, at no cost.
Ordering information is available at http://www.rcsb.org/pdb/cdrom.html.

MOLECULES OF THE QUARTER: NUCLEOSOME, RESTRICTION ENZYMES, AND LYSOZYME - The PDB has continued to feature its popular "Molecule of the Month" piece. Written and drawn by David S. Goodsell, an assistant professor of molecular biology at The Scripps Research Institute in La Jolla, California, these articles provide an overview of significant milestones in the growth of the PDB's macromolecular structure data for a diverse audience. Here is a sample of the information that is presented in this feature:

Nucleosome: A Molecular Librarian, a Paradox -- July, 2000 -- This is an auspicious time for molecular biology. The wave of knowledge that began in 1944 with Avery's discovery of DNA as the genetic material, which lead naturally to the atomic model of DNA proposed by Watson and Crick, and continued through detailed experiments to determine the genetic code, is now cresting with the release of the first draft of the human genome. This molecular text, written through billions of years of evolution, will provide untold insights into the molecular processes that underlie every aspect of our lives.
Each of our cells (or more correctly, nearly all of our cells) contain a copy of this genome, encoded in three billion base pairs of DNA. This information is precious and must be carefully guarded. Inside our cells, a collection of repair enzymes correct chemical changes inflicted on the strands by environmental insults. But the delicate strands must also be protected from physical damage. This is the job of nucleosomes.
The job of the nucleosome is paradoxical, requiring it to perform two opposite functions simultaneously. On one hand, nucleosomes must be stable, forming tight, sheltering structures that compact the DNA and keep it from harm. On the other hand, nucleosomes must be labile enough to allow the information in the DNA to be used. Polymerases must be allowed access to the DNA, both to transcribe messenger RNA for building new proteins and to replicate the DNA when the cell divides. The method by which nucleosomes solve these opposed needs is not well understood, but may involve a partial unfolding of the DNA from around the nucleosome, one loop at a time, as the information in the DNA is read.
Apart from their function of safely packaging DNA, nucleosomes also modify the activity of the genes that they store. Each nucleosome is composed of eight "histone" proteins bundled tightly together at the center, encircled by two loops of DNA. The histone proteins, however, are not completely globular like most other proteins. They have long tails, which comprise nearly a quarter of their length. The tails extend outward from the compact nucleosome, reaching out to neighbor- ing nucleosomes and binding them tightly together. The nucleus contains regulatory enzymes that chemically modify these tails to weaken their interactions. In this way, the cell makes particular genes more accessible to polymerases, allowing their particular information to be copied and used to build new proteins.
The histone proteins are perfectly designed for their jobs, so much so that histones are nearly identical in all non-bacterial organisms. Even slight modifications can be lethal. The surface of the histone octamer is decorated with positively charged amino acids. These interact strongly with the negatively-charged phosphate groups on the DNA. This serves to glue the DNA strand to the protein core. This is no simple task. DNA is normally a long, straight molecule, but in nucleosomes the DNA must be forcibly bent into these two tight circles.
An intact nucleosome may be viewed in the PDB entry 1AOI. Keep in mind that this structure only includes a short piece of DNA. In reality, these little nucleosomes are arrayed by the millions along long strands of DNA.

Restriction Enzymes: Bacteria Fight Back with Molecular Scissors -- August, 2000 -- Bacteria are under constant attack by bacteriophages, like the bacteriophage phiX174 described in an earlier Molecule of the Month. To protect themselves, many types of bacteria have developed a method to chop up any foreign DNA, such as that of an attacking phage. These bacteria build an endonuclease--an enzyme that cuts DNA--which is allowed to circulate in the bacterial cytoplasm, waiting for phage DNA. The endonucleases are termed "restriction enzymes" because they restrict the infection of bacteriophages.
Each type of restriction enzyme seeks out a single DNA sequence and precisely cuts it in one place. For instance, the enzyme EcoRI cuts the sequence GAATTC between the G and the A. Of course, roving endonucleases can be dangerous, so bacteria protect their own DNA by modifying it with methyl groups. These groups are added to adenine or cytosine bases (depending on the particular type of bacteria) in the major groove. The methyl groups block the binding of restriction enzymes, but they do not block the normal reading and replication of the genomic information stored in the DNA. DNA from an attacking bacteriophage will not have these protective methyl groups and will be destroyed. Each particular type of bacteria has a restriction enzyme (or several different ones) that cuts a specific DNA sequence, paired with a methyl- transferase enzyme that protects this same sequence in the bacterial genome.
The booming field of biotechnology was made possible by the discovery of restriction enzymes in the early 1950's. With them, DNA may be cut in precise locations. A second enzyme--DNA ligase--may then be used to reassemble the pieces in any desired order. Together, these two enzymes allow researchers to assemble customized genomes. For instance, researchers can create designer bacteria that make insulin or growth hormone or add genes for disease resistance to agricultural plants.
An interesting property of restriction enzymes simplifies this molecular cutting and pasting. Restriction enzymes typically recognize a symmetrical sequence of DNA in which the top strand is the same as the bottom strand, read backwards. When the enzyme cuts the strand, it leaves overhanging chains termed "sticky ends" because the base pairs formed between the two overhanging portions will glue the two pieces together even though the backbone is cut. Sticky ends are an essential part of genetic engineering, allowing researchers to cut out little pieces of DNA and place them in specific places, where the sticky ends match.
The PDB contains structures for many restriction enzymes. The PDB entries 1RVC and 1RVA are two such examples of this type of molecule.

Lysozyme: A Cellular Guardian Attacking Bacteria -- September, 2000 -- Lysozyme protects us from the ever-present danger of bacterial infection. It is a small enzyme that attacks the protective cell walls of bacteria. Bacteria build a tough skin of carbohydrate chains, interlocked by short peptide strands, that braces their delicate membrane against the cell's high osmotic pressure. Lysozyme breaks these carbohydrate chains, destroying the structural integrity of the cell wall. The bacteria burst under their own internal pressure.
Alexander Fleming discovered lysozyme during a deliberate search for medical antibiotics. Over a period of years, he added everything that he could think of to bacterial cultures, looking for anything that would slow their growth. He discovered lysozyme by chance. One day, when he had a cold, he added a drop of mucus to the culture and, much to his surprise, it killed the bacteria. He had discovered one of our own natural defenses against infection. Unfortunately, lysozyme is a large molecule that is not particularly useful as a drug. It can be applied topically, but cannot rid the entire body of disease, because it is too large to travel between cells. Fortunately, Fleming continued his search, finding a true antibiotic drug five years later: penicillin.
Lysozyme protects many places that are rich in potential food for bacterial growth. The lysozyme in the PDB entry 2LYZ is from hen egg whites, where it serves to protect the proteins and fats that will nourish the developing chick. It was the first enzyme ever to have its structure solved. Our tears and mucus contain lysozyme to resist infection of our exposed surfaces. Our blood is the worst place to have bacteria grow, as they would be delivered to all corners of the body. In the blood, lysozyme provides some protection, along with the more powerful methods employed by the immune system.
Lysozyme is a small, stable enzyme, making ideal for research into protein structure and function. Brian Matthews at the University of Oregon has performed a remarkable series of experiments, using lysozyme as the laboratory for study. He has performed hundreds of mutations on the lysozyme molecule made by a bacteriophage, changing one or more amino acids in the protein chain to a different one. He has studied the effect of removing large residues inside the protein, leaving a hole, or cramming a large amino acid inside, where it would not normally fit. He has attempted to create new active sites by creating new molecule-shaped pockets. Structures of hundreds of these mutant lysozymes are available at the PDB --so many, in fact, that lysozyme is the most common protein in the PDB. The structure from PDB entry 1L35 is a mutant. Two of its amino acids have been changed to cysteine, forming a new disulfide bridge in the mutation. The native enzyme can be found in the PDB entry 1LYD.
Lysozyme has a long active site cleft that binds to the bacterial carbohydrate chain. Based on computer modeling, it has been proposed that lysozyme distorts the shape of one sugar ring in the chain, making it more easy to cleave (although other studies have proposed that different effects, like electrostatics, are more important). The structure found in PDB entry 148L shows what this distorted ring might look like. Normally, sugar rings adopt a zig-zag "chair" structure.
STATEMENT OF SUPPORT - The PDB is supported by funds from the National Science Foundation, the Office of Biology and Environmental Research at the Department of Energy, and two units of the National Institutes of Health: The National Institute of General Medical Sciences and the National Library of Medicine, in addition to resources and staff made available by the host institutions.
Contact information for all RCSB Members is available at http://www.rcsb.org/pdb/rcsb-group.html.
PDB job listings are available at http://www.rcsb.org/pdb/jobs.html.
PDB Newsletter -- Number 7 -- Fall 2000 -- RCSB
Published quarterly by the Research Collaboratory for Structural Bioinformatics
Weekly PDB news is available on the Web at http://www.rcsb.org/pdb/latest_news.html.
This newsletter is available on the Web in HTML format at http://www.rcsb.org/pdb/newsletter/.