Data Deposition/Biocuration Services and Archive Management

In the fourth quarter of 2017, 2873 experimentally-determined structures were deposited to the PDB archive for a total of 13049 entries deposited in the year. In 2016, 11614 entries were deposited.

Of the structures deposited in 2017, 79.6% were deposited with a release status of hold until publication; 15.9% were released as soon as annotation of the entry was complete; and 4.6% were held until a particular date. 91.2% of these entries were determined by X-ray crystallographic methods; 3.6% were determined by NMR methods.

11,129 new PDB structures were released in 2017. They account for 8.2% of the yearend total holdings of 136,472.

An article focused on the structure validation reports produced by wwPDB is now available: Validation of Structures in the Protein Data Bank (2017) Structure 25: 1317-1318 doi: 10.1016/j.str.2017.10.009.

The paper describes how validation reports for PDB structures determined by X-ray crystallography were first made available to depositors in 2013 and added to the PDB archive in 2014. In early 2016, following the update of the OneDep system, reports for structures solved by nuclear magnetic resonance and 3D electron microscopy were made available to the depositors and later incorporated into the PDB archive.

These reports are based on the published recommendations of expert Validation Task Forces appointed by the wwPDB and EMDataBank partners. Preliminary reports are generated at a stand-alone webserver (validate.wwpdb.org), during deposition in OneDep (deposit.wwpdb.org) or programmatically via a web service API. The wwPDB partners strongly recommend the use of these options to verify structure quality prior to data submission to the PDB archive. Following biocuration, an official wwPDB structure validation report is produced, which should be submitted to journals along with manuscripts describing the structure. Upon public release of a PDB entry, the corresponding validation report is also made public in the PDB archive.

Data archives are critical to research and education. They provide safe and secure storage of valuable scientific data, and ensure that information is freely available to the world.

Data curation is critical for these data resources. In the case of PDB, data are carefully reviewed and annotated by wwPDB curators before public release. Expert curation of data coming into PDB is critical for ensuring findability, accessibility, interoperability, and reusability (FAIR). Biocurators enforce data standardization, help to maximize data quality, provide value-added annotation, and maintain uniformity in data representation.

To better understand the what goes into this important work, Science Careers spoke with RCSB PDB and other data resources to find out.


The rewards of working as a data wrangler by Maggie Kuo
Science Careers doi: 10.1126/science.caredit.aaq0481

A new FTP repository, ftp://ftp-versioned.wwpdb.org/ now hosts versioned structural model files in PDBx/mmCIF and PDBML formats. As announced in 2017, wwPDB has introduced a versioning system to enable depositor-initiated or wwPDB-initiated updates to previously released PDB entries while retaining the same PDB accession code. Updates to atomic coordinates, polymer sequence or chemical description in a PDB coordinate file will trigger a major version increment. Other changes will be classified as minor. All major versions of each PDB structure are retained in the new FTP archive. In the 2018 phase of the project, wwPDB will enable depositor-initiated updates of coordinates.

Additional information is available at wwPDB.org.

 

  • OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the Protein Data Bank (PDB) Archive (2017) Structure 25: 536-545 doi: 10.1016/j.str.2017.01.004
  • Multivariate Analyses of Quality Metrics for Crystal Structures in the Protein Data Bank Archive. (2017) Structure 25: 458-468 doi: 10.1016/j.str.2017.01.013

  • Data management: A global coalition to sustain core data (2017) Nature 543: 179 doi: 10.1038/543179a
  • Towards coordinated international support of core data resources for the life sciences (2017) bioRxiv doi: 10.1101/110825
  • Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive (2017) In Methods in Molecular Biology: Protein Crystallography Methods and Protocols (eds. A. Wlodawer, Z. Daughter, M. Jaskolski) Springer New York. 627-641 doi: 10.1007/978-1-4939-7000-1
  • PDB-Dev: A prototype system for depositing integrative / hybrid structural models (2017) Structure 25:1317-1318 doi: 10.1016/j.str.2017.08.001
  • Crystallography and Databases (2017) Data Science Journal 16:38 doi: 10.5334/dsj-2017-038