DATA DEPOSITION AND PROCESSING

5 easy steps for data deposition with ADIT

Structure Deposition Overview

Structures can be deposited to the wwPDB using the tools ADIT, ADIT-NMR, or AutoDep.

Data deposited to the archive is processed using agreed-upon standards for full validation of the data. These data are forwarded to the RCSB PDB for release into the archive. wwPDB members also maintain websites that provide different views of the data.

The following was presented this summer by Lead Annotator Jasmine Young at the American Crystallographic Association's Annual Meeting.

5 Easy Steps for Fast, Accurate, and Complete Data Deposition using the ADIT system

The first step would be to Verify the Sequence (1) you are depositing. The sequence submitted to the PDB archive should include any residues missing due to lack of electron density, cloning artifacts and HIS tags that were not cleaved, and any mutations or substitutions. This protein and or nucleotide sequence should be entered into a sequence database (e.g., BLAST) 1 to be compared with any existing sequence database references. Depositors should check sequence database correspondences and fix any sequence discrepancies found, such as unobserved gaps.

This sequence should then be entered, along with other information, when you use the pdb_extract (2) tool 2 . This program takes data from crystal or NMR structure determination programs and automatically fills in the items necessary for deposition - including refinement statistics, data collection, phasing, and more - to generate a complete data file for deposition. pdb_extract can also be used to build a text file that can be edited to use in depositing many related structures.

The mmCIF file generated from pdb_extract and for crystal structures, the structure factor file, should be uploaded into the Validation Server (3) 3 . (For depositors not using pdb_extract, the Validation Server can also read files in PDB or mmCIF files from many refinement programs). The Precheck step looks to see if the coordinate file and structure factor file is in a readable format. The Validation step produces a report that contains information about any close contacts; bond distance and angle deviations; chirality errors; sequence/coordinate (mis)alignments; missing and extra atoms or residues; and distant waters. The report also includes output from the programs NUCheck, SFCHECK 4 , and MolProbity 5 . Depositors should carefully review these reports and make corrections where necessary.

When depositing structures with ligands, users should Check Any Ligands (4) to find chemical component ids (three letter codes) for existing ligands. If the ligand is not found in the Chemical Component Dictionary, a chemical diagram chemical diagram (with bond order), IUPAC name, synonyms, formula, and potential three letter code should be emailed to deposit@deposit.rcsb.org.

Once you are comfortable with the reports generated from the Validation Server and you have either found or submitted your ligand, the structure can be deposited using ADIT (5). 6

The web-based version of ADIT asks users to upload the coordinate file and experimental data. The tool can then be used to enter any missing information. At this point, depositors should also indicate if their sequence can be released before the entire entry is released. After the deposition has been reviewed, the structure can be submitted. A PDB ID will be returned automatically.

A desktop version of ADIT is available for users who are behind firewalls. After your entry is deposited, the annotation staff will work to represent your PDB data in the best possible way. This work involves:

  • Reviewing entry for self-consistency
  • Confirming that the entry title matches the content of the deposited entry
  • Correcting any format errors in data and coordinates
  • Checking the sequence
  • Inserting sequence database references
  • Providing a protein name and synonyms
  • Checking the scientific name of the source organism
  • Confirming the chemical consistency between ligand name and coordinates
  • Adding biological assembly information
  • Checking the structure visually
  • Generating and reviewing validation reports
  • Finding citation references with PubMed 7

Report findings are compiled and sent back to the depositor. If no problems are found with the entry by the annotator or the depositor, then it is considered automatically approved and is ready to be released based upon the deposited release status. Entries can be released immediately, held until publication of the corresponding primary citation, or held until a particular date. Depositions cannot be held longer than one year.

Entries currently released by the wwPDB follows the PDB format as described in the PDB Contents Guide Version 3.1 and the mmCIF format that complies with the current PDB Exchange Dictionary (PDBx) v1.045. These formats contain the new features incorporated as part of remediation project, including

  • Two letter codes for DNA labeling
  • Better representations for complex assemblies
  • Standardized atom nomenclature that follows IUPAC naming

Deposition updates and questions about this process should be sent to deposit@deposit.rcsb.org.

Annotators at the RCSB PDB

In addition to curating data, annotation staff at the RCSB PDB are involved in a variety of educational and outreach projects, attend professional society meetings, and assist in software development. This position offers the opportunity to participate in an exciting project with significant impact on the scientific community.

If interested, please send your resume to Dr. Helen M. Berman at pdbjobs@rcsb.rutgers.edu.

Deposition Resources

1) Pdb_extract: pdb-extract.rcsb.org

2) Validation Suite: deposit.rcsb.org/validate or pdbdep.protein.osaka-u.ac.jp/validate

3) BLAST www.ebi.ac.uk/blast2 or www.ncbi.nih.gov/BLAST

4) Chemical Component Dictionary: remediation.wwpdb.org/downloads.html

5) ADIT: deposit.rcsb.org/adit or pdbdep.protein.osaka-u.ac.jp/adit ADIT-NMR: batfish.bmrb.wisc.edu/bmrb-adit


wwPDB Annotators at the September Retreat

References

1 D.L. Wheeler, T. Barrett, D.A. Benson, S.H. Bryant, K. Canese, V. Chetvernin, D.M. Church, M. DiCuccio, R. Edgar, S. Federhen, L.Y. Geer, Y. Kapustin, O. Khovayko, D. Landsman, D.J. Lipman, T.L. Madden, D.R. Maglott, J. Ostell, V. Miller, K.D. Pruitt, G.D. Schuler, E. Sequeira, S.T. Sherry, K. Sirotkin, A. Souvorov, G. Starchenko, R.L. Tatusov, T.A. Tatusova, L. Wagner, and E. Yaschenko (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 35(Database issue): D5-12.

2 H. Yang, V. Guranovic, S. Dutta, Z. Feng, H.M. Berman, and J. Westbrook (2004) Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Crystallogr D Biol Crystallogr. 60: 1833-1839.

3 J. Westbrook, Z. Feng, K. Burkhardt, and H.M. Berman (2003) Validation of protein structures for the Protein Data Bank. Meth Enz. 374: 370-385.

4 A.A. Vaguine, J. Richelle, and S.J. Wodak (1999) SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr D Biol Crystallogr. 55: 191-205.

5 I.W. Davis, L.W. Murray, J.S. Richardson, and D.C. Richardson (2004) MOLPROBITY: structure validation and all-atom contact analysis for nucleic acids and their complexes. Nucleic Acids Res. 32(Web Server issue): W615-9.

6 S. Dutta, K. Burkhardt, W.F. Bluhm, and H.M. Berman (2005) Using the tools and resources of the RCSB Protein Data Bank. Current Protocols in Bioinformatics: 1.9.1-1.9.40.

7 The UniProt Consortium (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res. 35(Database issue): D193-7.


back to top


 

Deposition Statistics

In the first three quarters of 2007, 6358 structures were deposited to the PDB archive and processed by the wwPDB. Of the structures deposited, 66.9% were deposited with a release status of "hold until publication"; 19.2% were released as soon as annotation of the entry was complete; and 13.9% were held until a particular date. 85.4% of these entries were determined by X-ray crystallographic methods; 14.2% were determined by NMR methods. 86.6% of these structures were deposited with experimental data. 94.1% of the crystal structures were deposited with structure factors; 43.6% of NMR structures were deposited with restraints.