Newsletter | Summer 2017 ⋅ Number 74

Data Deposition and Annotation

In the second quarter of 2017, 3332 experimentally-determined structures were deposited to the PDB archive for a total of 7109 entries deposited in the year. 

Of the structures deposited in 2017 so far, 71.9% were deposited with a release status of hold until publication; 23.9% were released as soon as annotation of the entry was complete; and 4.2% were held until a particular date. 92.4% of these entries were determined by X-ray crystallographic methods; 3.4% were determined by NMR methods. 

During the same time quarter, 2737 structures were released in the PDB.

Scientists from SGC and UCB recently utilized Diamond Light Source to develop a new method to extract previously hidden information from the X-ray diffraction data that are measured when resolving the three-dimensional (3D) atomic structures of proteins and other biological molecules.

The new Pan-Dataset Density Analysis (PanDDA) method extracts the picture of the bound compound in exceptionally clear and unambiguous detail. PanDDA first identifies the source of the noise, and then removes it from the data. It exploits Diamond’s ability to repeat dozens to hundreds of measurements quickly, which are then characterized for differences between them, indicating the presence of bound compound, after which a noise correction is applied in 3D.

The 860 structures represent four protein targets, with 785 ground state structures and 75 ligand-bound structures. Explore the structures by searching “PanDDA analysis group deposition” at the RCSB PDB website or by using the links below:

This set of structures is the largest submission to the PDB associated with a single publication. Working closely with the authors, the 860 structures were deposited in only 8 sessions using a new tool being developed by RCSB PDB.

The RCSB PDB Group Deposition system (GroupDep) supports automated depositions of large numbers of X-ray structures in parallel. It allows PDB depositors to take advantage of local templates and the PDB_extract program for batch processing, data packaging, upload, review, validation, and one-click submission of many closely related structures at once. Structures submitted via GroupDep can be received, biocurated, validated, and publicly released in a relatively short period of time.

GroupDep is being developed in part to support data coming from the NIH-NIGMS funded Drug Design Data Resource (D3R), which aims to advance the technology of computer-aided drug discovery through the interchange of high quality protein-ligand datasets and workflows, and by holding community-wide, blinded docking/scoring prediction challenges.

Depositors interested in testing the GroupDep system with a large dataset should contact info@rcsb.org.

For more information on this milestone, please see Game-changing PanDDA method unveils previously hidden 3D structure data at Diamond and Pearce, N. M. et al. A multi-crystal method for extracting obscured crystallographic states from conventionally uninterpretable electron density Nat. Commun. 8, 15123 doi: 10.1038/ncomms15123 (2017).

Cell Press’ CrossTalk blog interviewed Biocurator Chenghua Shao on his recent publication examining the impact of the wwPDB’s OneDep system for biocuration and validation on PDB data quality:

Multivariate Analyses of Quality Metrics for Crystal Structures in the Protein Data Bank Archive. C. Shao, H. Yang, J. D. Westbrook, J. Y. Young, C. Zardecki, S. K. Burley. Structure (2017) 25: 458–468 doi: j.str.2017.01.013

These analyses show how structure quality improved from the wwPDB OneDep system versus legacy PDB systems. There was however, little improvement in ligand quality from the OneDep system versus legacy systems. As Chenghua notes in his interview, the wwPDB is working to change this by incorporating recommendations developed at the 2015 Ligand Validation Workshop within the system.

A paper describing the OneDep system was also recently published in Structure.


The wwPDB is planning to introduce in 2017 a new procedure for the management by the Depositor of Record (where the Depositor of Record is defined as the Principal Investigator for the entry) of substantial revisions to previously released PDB archival entries.

At present, revised atomic coordinates for an existing released PDB entry are assigned a new accession code, and the prior entry is obsoleted. This long-standing wwPDB policy had the unintended consequence of breaking connections with publications and usage of the prior set of atomic coordinates, resulting in a non-trivial barrier to submission of atomic coordinate revisions by our Depositors of Record.

The wwPDB is introducing a file versioning system that allows Depositors of Record to update their own previously released entries. Please note, in the first phase, file versioning will be applied to the atomic coordinates refined versus unchanged experimental data.

Version numbers of each PDB archive entry will be designated using a #-# identifier. The first digit specifies the major version, and the second designates the minor version. The Structure of Record (i.e., the initial set of released atomic coordinates) is designated as Version 1-0. Thereafter, the major version digit is incremented with each substantial revision of a given entry (e.g., Version 2-0, when the atomic coordinates are replaced for the first time by the Depositor of Record). “Major version changes” are defined as updates to the atomic coordinates, polymer sequence(s), and/or chemical identify of a ligand. All other changes are defined as “minor changes”. When a major change is made, the minor version number is reset to 0 (e.g., 1-0 to 1-1 to 2-0). For the avoidance of doubt, the wwPDB will retain all major versions with the latest minor versions of an entry within the PDB archive.

Current wwPDB policies governing the deposition of independently refined structures based on the data generated by a research group or laboratory separate from that of the Depositor of Record remain unchanged. Versioning of atomic coordinates will be strictly limited to substitutions made by the Depositor of Record.

Upon introduction of the file versioning system, the wwPDB will revise each PDB accession code by extending its length and prepending “PDB” (e.g., "1abc" will become "pdb_00001abc"). This process will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files. For example, the atomic coordinates for the second major version of PDB entry 1abc would have the following form under the new file-naming schema:

pdb_00001abc_xyz_v2-0.cif.gz

The wwPDB is mindful of the importance of continuity in providing services and supporting User activities. For as long as practicable, the wwPDB will continue assigning PDB codes that can be truncated losslessly to the current four-character style. In the same spirit, initial implementation of entry file versioning will appear in a new, parallel branch of the PDB archive FTP tree. More details on the new FTP tree organization and accessibility of version information will be forthcoming. Data files in the current archive location ftp://ftp.wwpdb.org/pub/pdb/data/structures/ will continue to use the familiar naming style and will contain the latest version in the corresponding versioned archive.