5 easy steps for data deposition with ADIT
Structure Deposition Overview
Structures can be deposited to the wwPDB using the tools ADIT,
ADIT-NMR, or AutoDep.
Data deposited to the archive is processed using agreed-upon
standards for full validation of the data. These data are
forwarded to the RCSB PDB for release into the archive. wwPDB
members also maintain websites that provide different views of
the data.
The following was presented this summer by Lead Annotator
Jasmine Young at the American Crystallographic Association's
Annual Meeting.
5 Easy Steps for Fast, Accurate, and Complete Data Deposition
using the ADIT system
The first step would be to
Verify the Sequence (1) you are depositing. The sequence
submitted to the PDB archive should include any residues
missing due to lack of electron density, cloning artifacts and
HIS tags that were not cleaved, and any mutations or
substitutions. This protein and or nucleotide sequence should
be entered into a sequence database (e.g., BLAST)
1
to be compared with any existing sequence database references.
Depositors should check sequence database correspondences and
fix any sequence discrepancies found, such as unobserved gaps.
This sequence should then be entered, along with other
information, when you
use the pdb_extract (2) tool
2
. This program takes data from crystal or NMR structure
determination programs and automatically fills in the items
necessary for deposition - including refinement statistics,
data collection, phasing, and more - to generate a complete
data file for deposition. pdb_extract can also be used to build
a text file that can be edited to use in depositing many
related structures.
The mmCIF file generated from pdb_extract and for crystal
structures, the structure factor file, should be uploaded into
the
Validation Server (3)
3
. (For depositors not using pdb_extract, the Validation Server
can also read files in PDB or mmCIF files from many refinement
programs). The Precheck step looks to see if the coordinate
file and structure factor file is in a readable format. The
Validation step produces a report that contains information
about any close contacts; bond distance and angle deviations;
chirality errors; sequence/coordinate (mis)alignments; missing
and extra atoms or residues; and distant waters. The report
also includes output from the programs NUCheck, SFCHECK
4
, and MolProbity
5
. Depositors should carefully review these reports and make
corrections where necessary.
When depositing structures with ligands, users should
Check Any Ligands (4) to find chemical component ids
(three letter codes) for existing ligands. If the ligand is not
found in the Chemical Component Dictionary, a chemical diagram
chemical diagram (with bond order), IUPAC name, synonyms,
formula, and potential three letter code should be emailed to
deposit@deposit.rcsb.org.
Once you are comfortable with the reports generated from the
Validation Server and you have either found or submitted your
ligand, the structure can be deposited using
ADIT (5).
6
The web-based version of ADIT asks users to upload the
coordinate file and experimental data. The tool can then be
used to enter any missing information. At this point,
depositors should also indicate if their sequence can be
released before the entire entry is released. After the
deposition has been reviewed, the structure can be submitted. A
PDB ID will be returned automatically.
A desktop version of ADIT is available for users who are behind
firewalls. After your entry is deposited, the annotation staff
will work to represent your PDB data in the best possible way.
This work involves:
-
Reviewing entry for self-consistency
-
Confirming that the entry title matches the content of the
deposited entry
-
Correcting any format errors in data and coordinates
-
Checking the sequence
-
Inserting sequence database references
-
Providing a protein name and synonyms
-
Checking the scientific name of the source organism
-
Confirming the chemical consistency between ligand name and
coordinates
-
Adding biological assembly information
-
Checking the structure visually
-
Generating and reviewing validation reports
-
Finding citation references with PubMed
7
Report findings are compiled and sent back to the depositor. If
no problems are found with the entry by the annotator or the
depositor, then it is considered automatically approved and is
ready to be released based upon the deposited release status.
Entries can be released immediately, held until publication of
the corresponding primary citation, or held until a particular
date. Depositions cannot be held longer than one year.
Entries currently released by the wwPDB follows the PDB format
as described in the PDB Contents Guide Version 3.1 and the
mmCIF format that complies with the current PDB Exchange
Dictionary (PDBx) v1.045. These formats contain the new
features incorporated as part of remediation project, including
-
Two letter codes for DNA labeling
-
Better representations for complex assemblies
-
Standardized atom nomenclature that follows IUPAC naming
Deposition updates and questions about this process should be
sent to
deposit@deposit.rcsb.org.
Annotators at the RCSB PDB
In addition to curating data, annotation staff at the RCSB PDB
are involved in a variety of educational and outreach projects,
attend professional society meetings, and assist in software
development. This position offers the opportunity to
participate in an exciting project with significant impact on
the scientific community.
If interested, please send your resume to Dr. Helen M. Berman
at
pdbjobs@rcsb.rutgers.edu.
Deposition Resources
1) Pdb_extract:
pdb-extract.rcsb.org
2) Validation Suite:
deposit.rcsb.org/validate
or
pdbdep.protein.osaka-u.ac.jp/validate
3) BLAST
www.ebi.ac.uk/blast2
or
www.ncbi.nih.gov/BLAST
4) Chemical Component Dictionary:
remediation.wwpdb.org/downloads.html
5) ADIT:
deposit.rcsb.org/adit
or
pdbdep.protein.osaka-u.ac.jp/adit
ADIT-NMR:
batfish.bmrb.wisc.edu/bmrb-adit
wwPDB Annotators at the September
Retreat
References
1
D.L. Wheeler, T. Barrett, D.A. Benson, S.H. Bryant, K.
Canese, V. Chetvernin, D.M. Church, M. DiCuccio, R. Edgar, S.
Federhen, L.Y. Geer, Y. Kapustin, O. Khovayko, D. Landsman,
D.J. Lipman, T.L. Madden, D.R. Maglott, J. Ostell, V. Miller,
K.D. Pruitt, G.D. Schuler, E. Sequeira, S.T. Sherry, K.
Sirotkin, A. Souvorov, G. Starchenko, R.L. Tatusov, T.A.
Tatusova, L. Wagner, and E. Yaschenko (2007) Database
resources of the National Center for Biotechnology
Information. Nucleic Acids Res. 35(Database issue): D5-12.
2
H. Yang, V. Guranovic, S. Dutta, Z. Feng, H.M. Berman, and J.
Westbrook (2004) Automated and accurate deposition of
structures solved by X-ray diffraction to the Protein Data
Bank. Acta Crystallogr D Biol Crystallogr. 60: 1833-1839.
3
J. Westbrook, Z. Feng, K. Burkhardt, and H.M. Berman (2003)
Validation of protein structures for the Protein Data Bank.
Meth Enz. 374: 370-385.
4
A.A. Vaguine, J. Richelle, and S.J. Wodak (1999) SFCHECK: a
unified set of procedures for evaluating the quality of
macromolecular structure-factor data and their agreement with
the atomic model. Acta Crystallogr D Biol Crystallogr. 55:
191-205.
5
I.W. Davis, L.W. Murray, J.S. Richardson, and D.C. Richardson
(2004) MOLPROBITY: structure validation and all-atom contact
analysis for nucleic acids and their complexes. Nucleic Acids
Res. 32(Web Server issue): W615-9.
6
S. Dutta, K. Burkhardt, W.F. Bluhm, and H.M. Berman (2005)
Using the tools and resources of the RCSB Protein Data Bank.
Current Protocols in Bioinformatics: 1.9.1-1.9.40.
7
The UniProt Consortium (2007) The Universal Protein Resource
(UniProt). Nucleic Acids Res. 35(Database issue): D193-7.
back to top
Deposition Statistics
In the first three quarters of 2007, 6358 structures were
deposited to the PDB archive and processed by the wwPDB. Of the
structures deposited, 66.9% were deposited with a release
status of "hold until publication"; 19.2% were released as soon
as annotation of the entry was complete; and 13.9% were held
until a particular date. 85.4% of these entries were determined
by X-ray crystallographic methods; 14.2% were determined by NMR
methods. 86.6% of these structures were deposited with
experimental data. 94.1% of the crystal structures were
deposited with structure factors; 43.6% of NMR structures were
deposited with restraints.
|