Data Deposition/Biocuration Services and
Archive Management

In the third quarter of 2022, 4184 experimentally-determined structures were deposited to the PDB archive for a total of 12669 entries deposited in the year. Data are processed by wwPDB partners RCSB PDB, PDBe, PDBj and PDBc.

Of the structures deposited in 2022 so far, 84.7% were deposited with a release status of hold until publication; 7.8% were released as soon as annotation of the entry was complete; and 7.5% were held until a particular date. 65.2% of these entries were determined by X-ray crystallographic methods; 1.7% were determined by NMR methods; and 33.0% by 3DEM.

During the same time quarter, 3766 structures were released in the PDB, including 413 SARS-CoV-2 structures. 1786 EMDB maps were released in the archive.

wwPDB validation of 3DEM structures for which there is both a model and an EM volume will include the Q-score metric (Pintilie, G., et al., 2020, Nat. Methods). This follows recommendations from the wwPDB/EMDB workshop on cryo-EM data management, deposition and validation in 2020 (white paper in preparation), as well as EM Validation Challenge events (Lawson C., et al., 2020, Struct. Dyn.; Lawson, et al., 2021, Nat. Methods). This will be the first quantitative parameter of residue and chain resolvability for EM maps in wwPDB validation reports and will provide an additional map-model assessment criterion.

The Q-score calculates the resolvability of atoms by measuring similarity of the map values around each atom relative to a Gaussian-like function for a well resolved atom. Q-score of 1 indicates that the similarity is perfect whilst closer to 0 indicates the similarity is low. If the atom is not well placed in the map then a negative Q-score value may be reported. Therefore, Q-score values in the reports will be in a range of -1 to +1.

The wwPDB EM validation reports will provide Q-scores for single particle, helical reconstruction, electron crystallography and subtomogram averaging entries for which both an EM map and coordinate model have been deposited.

Validation reports (PDF files) will contain images of the average per-residue Q-scores color-mapped onto ribbon models with views from three orthogonal directions. Similar images will also be introduced to visualize the per-residue atom-inclusion scores. Comparison of these two sets of images will assist in visual assessment of the model-to-map fit and quality.

The images below show the model with each residue colored according to its Q-score.

Example showing mostly cyan colors indicating Q-score closer to 1 and a good resolvability of atoms

Example showing mostly cyan colors indicating Q-score closer to 1 and a good resolvability of atoms.

Example showing mostly red colors indicating Q-score closer to 0 and not a good resolvability of atoms

The validation reports will also contain a table of average per-chain values of both metrics (Q-score and atom inclusion) as well as their overall average values for the entire model."/>

Example showing mostly red colors indicating Q-score closer to 0 and not a good resolvability of atoms

The validation reports will also contain a table of average per-chain values of both metrics (Q-score and atom inclusion) as well as their overall average values for the entire model.

The per-residue and the per-chain average atom-inclusion and Q-score values will also be provided in the mmCIF and XML formatted validation files. Visit wwPDB.org for details.

wwPDB, in collaboration with the PDBx/mmCIF Working Group, has set plans to extend the length of accession codes (IDs) for PDB and Chemical Component Dictionary (CCD) entries in the future. PDB entries containing these extended IDs will not be supported by the legacy PDB file format. 

CCD ID extension

CCD entries are currently identified by unique three-character alphanumeric IDs. At current growth rates, we anticipate running out of three-character IDs before 2024. After this point, the wwPDB will issue five-character alphanumeric accession codes for CCD IDs in the OneDep system. To avoid confusion with current four-character PDB IDs, four-character codes will not be used. Owing to limitations of the legacy PDB file format, PDB entries containing the new five character ID codes will only be distributed in PDBx/mmCIF format.

In addition, wwPDB has reserved a set of CCD IDs: 01 - 99, DRG, INH, LIG that will never be used in the PDB. These reserved codes can be used for new ligands during structure determination so that they can be identified as new upon deposition and added to the CCD during biocuration.

PDB ID extension

wwPDB will be extending PDB ID length to eight characters prefixed by ‘pdb’, e.g., pdb_00001abc. Each PDB entry has a corresponding Digital Object Identifier (DOI), often required for manuscript submission to journals and described in publications by the structure authors. Extended PDB IDs and corresponding PDB DOIs have been included in the PDBx/mmCIF formatted atomic coordinate files for all new and re-released entries since August 2021.

For example, PDB entry issued with 4-character PDB ID, 1abc, will have the extended PDB ID (pdb_00001abc) and corresponding PDB DOI (10.2210/pdb1abc/pdb), as listed in the _database_2 PDBx/mmCIF category.

Loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB 1abc pdb_00001abc 10.2210/pdb1abc/pdb

For example, PDB entry issued with 8-character PDB ID, pdb_00099xyz, after all 4-character IDs are consumed:

Loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB pdb_00099xyz pdb_00099xyz 10.2210/pdb_00099xyz/pdb

After all four-character PDB IDs are consumed, newly-deposited PDB entries will only be issued extended PDB ID codes, and PDB entries will only be distributed in PDBx/mmCIF format. PDB entries with four-character PDB IDs will remain unchanged.

Resources

wwPDB is asking users and software developers to review their code and remove any current limitations on PDB and CCD ID lengths, and to enable use of PDBx/mmCIF format files. Example files with extended PDB and/or CCD IDs are available via github to assist code revisions, see https://github.com/wwPDB/extended-wwPDB-identifier-examples. To learn about PDBx/mmCIF, please visit https://mmcif.wwpdb.org/.

For any further information please contact us at info@wwpdb.org.

Graph showing the number of available 3-character CCD IDs annually

The number of available 3-character CCD IDs annually.