Data Deposition/Biocuration Services and Archive Management

In the third quarter of 2021, 3676 experimentally-determined structures were deposited to the PDB archive for a total of 10,997 entries deposited in the year.  Data are processed by wwPDB partners RCSB PDB, PDBe, and PDBj.

Of the structures deposited in 2021 so far, 87.0% were deposited with a release status of hold until publication; 7.4% were released as soon as annotation of the entry was complete; and 5.6% were held until a particular date. 69.2% of these entries were determined by X-ray crystallographic methods; 2.6% were determined by NMR methods; and 28.0% by 3DEM.

During the same time quarter, 30,97 structures were released, including 162 SARS-CoV-2 structures. 1,184 EMDB maps were released in the archive.

Improved access to small molecule definitions

Individual Chemical Component Dictionary (CCD) and Biologically Interest molecule Reference Dictionary (BIRD) definitions are now accessible in a new FTP tree in the PDB archive. In response to user requests, these individual CCD and BIRD entry files can be found at /pdb/refdata/chem_comp/ and /pdb/refdata/bird/, respectively with last character hash as sub-directory.
For example:

  • /pdb/refdata/chem_comp/C/D8C/D8C.cif
  • /pdb/refdata/bird/prd/8/PRD_001068.cif

Improved access to information about PDB archive holdings

New inventory data files offer a quick overview of data in the archive. These files are in the extensible JSON format, and can be found under the new /pdb/holdings/ FTP tree.
The inventory lists provided include:

  • all_removed_entries.json.gz: list of removed PDB entries (obsolete, models) with entry authors, entry title, release date, obsolete date , and superseding PDB ID, if any.
  • current_file_holdings.json.gz: List of released PDB entries and file types present for each entry in the PDB Core Archive (e.g., coordinate data, experimental data, validation report, ...)
  • obsolete_structures_last_modified_dates.json.gz: List of obsolete PDB entries with last time of PDBx/mmCIF file modification
  • refdata_id_list.json.gz: List of released chemical reference entries, content types (e.g., Chemical Component, BIRD), and last time of reference file modification
  • released_structures_last_modified_dates.json.gz: List of released PDB entries with last time of PDBx/mmCIF file modification
  • unreleased_entries.json.gz: List of on-hold PDB entries, entry status, deposition date, and sequence pre-release information

The inventory (index) files historically provided in /pdb/derived_data/ will continue to be updated for the time being; they will eventually be removed from the PDB archive. Users are encouraged to utilize these new inventory files.

wwPDB manages the PDB Core Archives as a public good according to the FAIR Principles. In support of the FAIR objectives, wwPDB has replaced its historical data access license with a standard open source license from Creative Commons, the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.

The new CC0 license provides the same open access as the prior license. PDB data remain freely available to all PDB Users including commercial users.

The 2021 wwPDB Charter Agreement and Usage Policy have been updated to reflect the new license.

Users of PDB data are encouraged to attribute the original authors of the PDB structure data where possible.

Hemoglobin Illustration

Journals, PDB users, and software developers should review code and begin to prepare for the change in format of PDB IDs and inclusion of PDB DOIs in PDBx/mmCIF files

wwPDB, in collaboration with the PDBx/mmCIF Working Group, has set plans to extend the length of ID codes for PDB and Chemical Component Dictionary (CCD) ID entries in the future. These extended formats are not supported by the legacy PDB file format.

As announced previously, wwPDB has extended PDB ID length to eight characters prefixed by ‘PDB’, e.g., pdb_00001abc.

Each PDB ID is issued a corresponding Digital Object Identifier (DOI), often required for manuscript submission to journals and described in publications by the structure authors.
To help depositors provide information to journals, OneDep now displays the PDB ID and DOI on the submission confirmation page.
The extended PDB IDs and corresponding PDB DOIs, along with existing four character PDB IDs, are now included in the PDBx/mmCIF formatted files. Initially, this will only be available for updated and newly-released PDB entries, with an archive-wide update at a later date.

The additional accessions will be provided in the _database_2 PDBx/mmCIF category. For example, PDB entry 1ABC will have the extended PDB ID (pdb_00001abc) and the corresponding PDB DOI (10.2210/pdb1abc/pdb).

loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB 1abc pdb_00001abc 10.2210/pdb1abc/pdb
WWPDB D_1xxxxxxxxx ? ?

Once all available four-character PDB IDs have been consumed, newly-deposited PDB entries will only be issued extended PDB ID codes. These entries will only be distributed in PDBx/mmCIF format.

wwPDB asks journals, users, and software developers to review code and remove related limitations.

ORCiD logo

wwPDB continues to support research, education, and drug discovery worldwide. Open access to PDB data has helped researchers in structure-guided discovery and development of anti-coronavirus drugs, vaccines and neutralizing antibodies. When researchers analyze existing PDB structures, such as working on a similar structure, they may often need additional information impossible to retrieve from the PDB entry file alone. In particular, it is not possible to obtain a point of contact in cases where there is no associated primary publication for an entry.

Following a recommendation from the IUCr Commission on Biological Macromolecules and the IUCr Committee on Data, wwPDB now make public the PI name, email address, and ORCiD ID for initial PDB depositions or re-submissions. This will enable contact with the authors of every released PDB structure as of that date. This release will also align the PDB with the standard practices of providing corresponding author information by scientific journals

The dated acceptance of these PDB Terms and Conditions described above are captured within the OneDep system. The responsible depositor who creates the deposition should make entry PI(s) aware of the policy change to include PI name, email address, and ORCiD in public PDBx/mmCIF files.