Newsletter | Spring 2017 ⋅ Number 73

Data Deposition and Annotation

In the first quarter of 2017, 3800 experimentally-determined structure coordinate entries were deposited to the archive.

65.7% were deposited with a release status of hold until publication; 29.6% were released as soon as annotation of the entry was complete; and 4.7% were held until a particular date.

93.4% of these entries were determined by X-ray crystallographic methods; 3.3% by NMR methods; and 3.2% by 3DEM.

During the same period, 3350 structures and 222 EMDB maps were released in the PDB.

The 10th International Biocuration Conference was held at Stanford University, Palo Alto California, USA on March 26-29, 2017.

At the 2016 meeting, it was announced that RCSB PDB's John Westbrook would be recognized by the society's first Career Award in recognition of his contributions to the field of biocuration.

More than twenty years ago, while still a graduate student, John recognized the importance of a well-defined data model for ensuring the delivery of high quality and reliable data to the user community.

He was the principal architect of the PDBx/mmCIF data representation for biological macromolecular data. It is based on a simple, context-free grammar free of column width limitations. Data are presented in either key-value or tabular form. All relationships between common data items (e.g., atom and residue identifiers) are explicitly documented within the PDBx Exchange Dictionary. This permits software applications to evaluate and validate referential integrity with any PDB entry. The current PDBx/mmCIF dictionary contains over 4000 definitions for the experiments involved in macromolecular structure determination and descriptions of the models themselves. This dictionary is the basis for the all deposition and annotation procedures used by the wwPDB and is the Master Format for the PDB archive. John also established the Chemical Component Dictionary to maintain and distribute small molecule chemical reference data in the PDB.

In his award lecture at this year's meeting, John described his Perspectives on the Evolution of the Data Architecture.

In addition, Jasmine Young presented a poster on the Biocuration of Experimentally-Determined 3D Macromolecular Structures and their Complexes at the wwPDB. wwPDB Biocurators were recently highlighted in a Cell Press Cross Talk blog post entitled OneDep is a step in the right direction.




OneDep is the wwPDB’s unified system for the deposition, biocuration, and validation of macromolecular structures globally across all wwPDB, EMDB, and BMRB deposition sites.  It was developed to meet the evolving requirements of the scientific community to archive structural data over the coming decades.

OneDep provides a user-friendly deposition interface and improved structure validation with the benefit of recommendations from expert task forces representing the respective methodological communities. The processing efficiency in biocuration is improved as OneDep supports a more automated workflow.

As Milka Kostic, the Senior Editor at Structure and Cell Chemical Biology notes in the CellPress CrossTalk blog, OneDep’s support for both depositors and Biocurators is a step in the right direction:

What impresses me the most about OneDep, as well as previous systems, is how much of this global enterprise rests on the shoulders of Biocurators, the unsung heroes of the PDB. Although we have been working to outsource a great deal of checks and balances to algorithms and machines, the fine checking, annotation, curation, and decision-making are done by a dedicated team of the PDB Biocurators.

The OneDep system is described in detail in a recent issue of Structure:

OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the Protein Data Bank (PDB) Archive J. Young, J. D. Westbrook, Z. Feng, R. Sala, E. Peisach, T. J. Oldfield, S. Sen, A. Gutmanas, D. R. Armstrong, J. M. Berrisford, L. Chen, M. Chen, L. Di Costanzo, D. Dimitropoulos, G. Gao, S. Ghosh, S. Gore, V. Guranovic, P. M. S. Hendrickx, B. P. Hudson, R. Igarashi, Y. Ikegawa, N. Kobayashi, C. L. Lawson, Y. Liang, S. Mading, L. Mak, M. S. Mir, A. Mukhopadhyay, A. Patwardhan, I. Persikova, L. Rinaldi, E. Sanz-Garcia, M. R. Sekharan, C. Shao, G. J. Swaminathan, L. Tan, E. L. Ulrich, G. Van Ginkel, R. Yamashita, H. Yang, M. A. Zhuravleva, M. Quesada, G. J. Kleywegt, H. M. Berman, J. L. Markley, H. Nakamura, S. Velankar, S. K. Burley. (2017) Structure 25: 536-545 doi: 10.1016/j.str.2017.01.004

The wwPDB is preparing an update of PDBx/mmCIF model files for all entries in the PDB archive to V5 version of the PDBx/mmCIF dictionary. When completed, all PDB model files will have better organized content and will conform to the revised data model used within the wwPDB OneDep System.

In May, updated model files for all experimental methods will be made available in a new PDB FTP server (ftp://ftp-beta.wwpdb.org/pub/pdb/data/structures/), and the corresponding PDBx/mmCIF dictionary will be released.  Users are strongly encouraged to review and test these updated data files.

In July, the current PDB FTP archive will be updated with new files corresponding to the V5 PDBx/mmCIF dictionary.  Visit wwpdb.org for current documentation.

The wwPDB partners are pleased to announce that updated validation reports for all X-ray, NMR, and 3DEM structures deposited in the PDB archive are available.  Updates include new percentile statistics reflecting the state of the PDB archive on December 31th 2016 and updated versions of the Mogul software (2017) and CSD archive (as538be).

Further information and sample validation reports are available.