Data Deposition/Biocuration Services and Archive Management

In the second quarter of 2022, 4145 experimentally-determined structures were deposited to the PDB archive for a total of 8511 entries deposited in the year. Data are processed by wwPDB partners RCSB PDB, PDBe, and PDBj.

Of the structures deposited in 2022 so far, 82.9% were deposited with a release status of hold until publication; 9.1% were released as soon as annotation of the entry was complete; and 8.0% were held until a particular date. 65.9% of these entries were determined by X-ray crystallographic methods; 1.7% were determined by NMR methods; and 32.2% by 3DEM.

During the same time quarter, 3232 structures were released in the PDB, including 226 SARS-CoV-2 structures. 2018 EMDB maps were released in the archive.

As of May 3, 2022, the PDB archive distributes assembly files in PDBx/mmCIF format, allowing direct access and visualization of the curated assemblies for all PDB entries (original announcement).

Previously, PDBx/mmCIF formatted assembly files provided for structures were non-PDB compliant, however the coordinates use model numbers to differentiate alternate symmetry copies of PDB chain IDs. This method is not ideal, nor necessary, for the current archive PDBx/mmCIF format and has led to limited use of these files in community software tools. In response to this issue and recommendations by the wwPDB advisory committee, we are implementing updated, standardized practices for generation of assembly files for all PDB entries.

These updated PDBx/mmCIF format assembly files have improved organization of assembly data to support usage by the community. These files will include all symmetry generated copies of each chain within a single model, with distinct chain IDs (_atom_site.auth_asym_id and _atom_site.label_asym_id) assigned to each. Generation of distinct chain IDs in assembly files are based upon the following rules:

  • The applied index of the symmetry operator (pdbx_struct_oper_list.id) will be appended to the original chain ID separated by a dash (e.g., A-2, A-3, etc.)
  • If there are more than one type of symmetry operators applied to generate symmetry copy, a dash sign will be used between two operators (e.g., A-12-60, A-60-88, etc.)

In addition, entity ID and chain ID mapping categories are provided: _pdbx_entity_remapping and _pdbx_chain_remapping.

A new directory (ftp.wwpdb.org/pub/pdb/data/assemblies/mmCIF/) was created for the distribution of these updated assembly files. The directory containing the existing assembly mmCIF files for large entries has been removed (ftp.wwpdb.org/pub/pdb/data/biounit/mmCIF/'>ftp.wwpdb.org/pub/pdb/data/biounit/mmCIF/).

wwPDB asks all PDB users and software developers to review code and address any limitations related to PDB assemblies. Sample files were made available for testing purposes and to support community adoption at GitHub.com/wwpdb/assembly-mmcif-examples.

If you plan to use these assembly files for graphical viewing, check if your visualization software (e.g., PyMol, ChimeraX, etc.) supports instantiation of assemblies directly from atomic coordinate files (_struct_assembly related categories), for improved efficiency.

For any further information please email info@wwpdb.org.

Journal of Molecular Biology Cover

A new article by the wwPDB and the PDBx/mmCIF Working Group describes the community-driven data representation for structural biology data that is critical to the PDB archive. It describes file standards and governance, and summarizes software tools for data processing and checking.

PDBx/mmCIF Ecosystem: Foundational Semantic Tools for Structural Biology
John D. Westbrook, Jasmine Y. Young, Chenghua Shao, Zukang Feng, Vladimir Guranovic, Catherine L. Lawson, Brinda Vallat, Paul D. Adams, John M. Berrisford, Gerard Bricogne, Kay Diederichs, Robbie P. Joosten, Peter Keller, Nigel W. Moriarty, Oleg V. Sobolev, Sameer Velankar, Clemens Vonrhein, David G. Waterman, Genji Kurisu, Helen M. Berman, Stephen K. Burley, Ezra Peisach
(2022) Journal of Molecular Biology 434: 167599 doi: 10.1016/j.jmb.2022.167599

This article is part of the Journal of Molecular Biology's special issue on Computational Resources for Molecular Biology 2022.  It is dedicated to John D. Westbrook, whose work established the PDBx/mmCIF data dictionary and format as the foundation of the modern Protein Data Bank (PDB) archive (wwPDB.org).