Data Deposition/Biocuration Services and
Archive Management

In the third quarter of 2023, 4,710 experimentally-determined structures were deposited to the PDB archive for a total of  13,081 entries deposited in the year.  Data are processed by wwPDB partners RCSB PDB, PDBe, PDBj and PDBc.

Of the structures deposited in 2023 so far, 82.8% were deposited with a release status of hold until publication  8.7% w ere released as soon as annotation of the entry was complete  and 8.5% were held until a particular date. 62.1% of these  entries were determined by X-ray crystallographic methods  1.8% were determined by NMR methods  and 36.0% by 3DEM.

During the same time quarter, 3,549 structures were released in the PDB, including 128 SARS-CoV-2 structures. 2,069 EMDB maps were released in the archive.

The number of available 3-character CCD IDs as of September 2023

The number of available 3-character CCD IDs as of September 2023.

At current growth rates, we anticipate running out of three-character Chemical Component IDs by the end of 2023. After this point, the wwPDB will issue five-character alphanumeric accession codes for CCD IDs in the OneDep system. To avoid confusion with current four-character PDB IDs, four-character codes will not be used. Owing to limitations of the legacy PDB file format, PDB entries containing the new five character ID codes will only be distributed in PDBx/mmCIF and PDBML formats (see previous announcement).

In addition, wwPDB has reserved a set of CCD IDs: 01 - 99, DRG, INH, LIG that will never be used in the PDB. These reserved codes can be used for new ligands during structure determination so that they can be identified as new upon deposition and added to the CCD during biocuration.

wwPDB asks users and software developers to review code to remove any current limitations on CCD ID lengths, and to enable use of PDBx/mmCIF format files. Example files with extended CCD IDs are available via GitHub to assist code revisions. Information about the PDBx/mmCIF dictionary and file format is provided at mmcif.wwpdb.org.
Decorative icon

Version 1.0 of the next generation archive repository (NextGen) for the PDB archive was made available in early 2023. This “NextGen” archive hosts enriched atomic coordinate files, in both PDBx/mmCIF and PDBML formats, with files available to download at files-nextgen.wwpdb.org.

The initial launch of the NextGen archive enriched coordinate files from the core PDB archive with sequence annotation from external resources such as UniProt, SCOP2 and Pfam at atom, residue, and chain levels. After consulting with the user community, this release has added intra-molecular connectivity for each residue present in an entry, helping users transitioning from legacy PDB format to PDBx/mmCIF format. The connectivity information includes atom pairs, bond order, aromatic flag, and stereochemistry as incorporated from the PDB Chemical Component Dictionary (CCD). Users can extract this information from the _chem_comp_bond and _chem_comp_atom categories of the PDBx/mmCIF-formatted files from the NextGen archive.  Visit wwPDB for more information.

Decorative icon

In October 2023, the wwPDB will roll out updated CCD data files with standardized atom naming and additional annotation of protein backbone and terminal atoms within peptide residues. Entries containing those updated CCDs will also be updated accordingly. This will improve the Findability and Interoperability of the PDB data, as well as open up new opportunities to use the updated peptide residue annotation.

As part of this remediation process, wwPDB will add new data items to the CCD files for peptide-linking components to label atoms that form the backbone, N- or C-terminal groups. Three new CCD data items will be added to the CCD category _chem_comp_atom as pdbx_backbone_flag, pdbx_n-terminal_flag and pdbx_c-terminal_flag, flagging the backbone, N-terminal and C-terminal atoms, respectively.

Furthermore, we will be standardizing the atom nomenclature of peptide backbone atoms in CCD files to follow a standard convention. This will follow a set of rules, ensuring that atom nomenclature for carboxyl groups, amino groups and side chain linked carbons (C-alpha) follow a standard atom nomenclature. This will allow clear identification of backbone atoms for peptide residues across the whole archive.  Visit wwPDB for detailed information.