Data Deposition/Biocuration Services and Archive Management

In the first quarter of 2022, 4,340 experimentally-determined structures were deposited to the archive. Data are processed by wwPDB partners RCSB PDB, PDBe, and PDBj.

Of all structures deposited, 79.7% were deposited with a release status of hold until publication; 9.5% were released as soon as annotation of the entry was complete; and 10.8% were held until a particular date. 66.6% of these entries were determined by X-ray crystallographic methods; 2.0% were determined by NMR methods; and 31.4% by 3DEM.

During the same period, 3391 structures were released, including 313 SARS-CoV-2 structures. 1,298 EMDB maps were released in the archive.

ModelCIF, an extension of PDBx/mmCIF for computed structure models, is now available. The PDBx/mmCIF data standard underpins the Protein Data Bank (PDB) Core Archive, which is jointly managed by the worldwide Protein Data Bank (wwPDB) consortium. A software library called python-modelcif has been developed to support ModelCIF and enables reading and writing mmCIF files compliant with ModelCIF.

ModelCIF serves as the data standard for representing structural models of macromolecules obtained using computational methods. These computed structure models may be derived from existing structure templates using homology or comparative modeling or can be obtained from ab initio modeling methods. ModelCIF data standard is being adopted by computational biologists as well as major repositories of computed structure models, including ModelArchive, MODBASE, and AlphaFoldDB Protein Structure Database repositories for computed structure models. Partial support for ModelCIF is also available in SWISS-MODEL projects and will soon be added to the SWISS-MODEL Repository.


ModelCIF Working Group photo

ModelCIF Working Group

ModelCIF is developed and maintained by the wwPDB ModelCIF Working Group (WG), consisting of representatives from the wwPDB and the computational structural biology community. The WG is focused on developing common data standards and software tools for archiving and visualization of computed structure models. The WG promotes adoption of ModelCIF within the computational modeling community, and is also involved in developing software tools that support ModelCIF. Research teams making computed structure models available from their own web portals are strongly encouraged to do so using the ModelCIF data standard and integrate them into the 3D-Beacons network. Structural biologists are strongly encouraged to deposit computed structure models to the ModelArchive to ensure long-term preservation and public access. Guidelines on how to deposit computed structure models together with relevant metadata are also available on ELIXIR's RDMkit page for structural bioinformatics.

Extensions to the PDBx/mmCIF dictionary for reflection data with anisotropic diffraction limits, for unmerged reflection data, and for quality metrics of anomalous diffraction data have been developed by a subgroup of the wwPDB PDBx/mmCIF Working Group and are now supported in OneDep. These extensions facilitate the deposition and archiving of a broader range of diffraction data, as well as new quality metrics pertaining to these data.

wwPDB strongly encourages structural biologists to always use the latest versions of structure determination software packages to produce data files for PDB deposition. wwPDB also encourages crystallographers wishing to deposit new structures together with their associated diffraction data to use the software which guarantees consistency between data and final model. This consistency is difficult to achieve when separate diffraction data files and model coordinate files are pieced together a posteriori by ad hoc means.

wwPDB also encourages depositors to make their raw diffraction images available from one of the public repositories to allow direct access to the original diffraction image data.

Visit wwPDB.org for detailed information about these extensions.

Starting May 3, 2022, the PDB archive will distribute assembly files in PDBx/mmCIF format, allowing direct access and visualization of the curated assemblies for all PDB entries.

Currently, PDBx/mmCIF formatted assembly files are provided for structures that are non-PDB compliant, however the coordinates use model numbers to differentiate alternate symmetry copies of PDB chain IDs. This method is not ideal, nor necessary, for the current archive PDBx/mmCIF format and has led to limited use of these files in community software tools. In response to this issue and recommendations by the wwPDB advisory committee, we are implementing updated, standardized practices for generation of assembly files for all PDB entries.

These updated PDBx/mmCIF format assembly files will have improved organization of assembly data to support usage by the community. These files will include all symmetry generated copies of each chain within a single model, with distinct chain IDs (_atom_site.auth_asym_id and _atom_site.label_asym_id) assigned to each. Generation of distinct chain IDs in assembly files are based upon the following rules:

  1. The applied index of the symmetry operator (pdbx_struct_oper_list.id) will be appended to the original chain ID separated by a dash (e.g., A-2, A-3, etc.)
  2. If there are more than one type of symmetry operators applied to generate symmetry copy, a dash sign will be used between two operators (e.g., A-12-60, A-60-88, etc.)

In addition, entity ID and chain ID mapping categories will be provided: _pdbx_entity_remapping and _pdbx_chain_remapping.

A new directory (ftp.wwpdb.org/pub/pdb/data/assemblies/mmCIF/) will be created for the distribution of these updated assembly files. The directory containing the existing assembly mmCIF files for large entries will be removed (ftp.wwpdb.org/pub/pdb/data/biounit/mmCIF/).

wwPDB asks all PDB users and software developers to review code and address any limitations related to PDB assemblies. Sample files are made available for testing purposes and to support community adoption at GitHub.com/wwpdb/assembly-mmcif-examples.
If you plan to use these assembly files for graphical viewing, check if your visualization software (e.g., PyMol, ChimeraX, etc.) supports instantiation of assemblies directly from atomic coordinate files (_struct_assembly related categories), you do so for improved efficiency.

For any further information please email info@wwpdb.org.

Decorative Icon

Individual Chemical Component entries are available for download from the PDB Archive as individual files

Individual Chemical Component Dictionary (CCD) and Biologically Interest molecule Reference Dictionary (BIRD) definitions are now accessible in an FTP tree in the PDB archive. In response to user requests, these individual CCD and BIRD entry files can be found at /pdb/refdata/chem_comp/ and /pdb/refdata/bird/, respectively with last character hash as sub-directory.

For example:

  1. /pdb/refdata/chem_comp/C/D8C/D8C.cif
  2. /pdb/refdata/bird/prd/8/PRD_001068.cif

Visit wwPDB.org for more information.

As announced previously, deposition of half-maps for single-particle, single-particle-based helical, and sub-tomogram averaging reconstructions to the EM Data Bank (EMDB) has been mandatory as of February 25, 2022. This change is in response to a long-standing community request to the wwPDB EMDB Core Archive and was also a recommendation from the 2020 wwPDB single-particle cryo-EM data-management workshop (white paper in preparation). Several recommendations from this workshop have already been implemented in the wwPDB OneDep system. These include improvements to wwPDB validation reports and enhancements for capturing metadata via the deposition interface.

For more information, visit wwPDB.org.