Data Deposition/Biocuration Services and Archive Management

In the first quarter of 2020, 3961 experimentally-determined structures were deposited to the archive.  Data are processed by wwPDB partners RCSB PDB, PDBe, and PDBj.

81.1% were deposited with a release status of hold until publication; 14.4% were released as soon as annotation of the entry was complete; and 4.5% were held until a particular date.

82.5% of these entries were determined by X-ray crystallographic methods; 3.3% by NMR methods; and 14.2% by 3DEM.

During the same period, 3064 structures and 714 EMDB maps were released in the PDB.

Authors (PIs) of released PDB structures can update model coordinates while retaining the same PDB accession code, thereby preserving the link with the original publication. In this second and final phase of the project, versioning functionality has been extended from structures deposited using the OneDep system to all entries (previously).

For entries submitted via OneDep (2014-onwards), depositors should log into the corresponding session at deposit.wwpdb.org and submit the request via the OneDep communication panel. For entries submitted via legacy systems, requests should be emailed to deposit-help@mail.wwpdb.org with the PDB code included in the subject and body of the email.

Once submitted, the revised model will be processed by wwPDB biocurators and the updated version released immediately upon depositor’s approval. Versioning of PDB entries will be limited to changes in the coordinate files, with no changes permitted to the deposited experimental data. To limit the impact on wwPDB biocuration resources, PDB versioning is currently restricted to one replacement per PDB entry per year, and three entries per Principal Investigator per year. This restriction will be reviewed in 2021.

The most recent version of the entry will be made available in the PDB archive FTP (ftp.wwpdb.org). All major versions of a PDB structure will be retained in the versioned FTP archive (ftp-versioned.wwpdb.org)–more information can be found at wwPDB.org. The structure of the versioned FTP archive allows for future extension of the PDB ID code format. PDB entry 1abc would therefore be found in the folder pdb_00001abc.

Changes made to entries during versioning are considered to be either major or minor. Updates to atomic coordinates, polymer sequence, or chemical description trigger a major version increment, while changes to any other categories are classified as minor. Changes introduced are recorded in the PDBx/mmCIF audit categories.

In July 2020, wwPDB will roll out updated PDB structures and reference data files with standardized representation of carbohydrate molecules, improving the Findability and Interoperability of PDB data. Detailed information about this work is available from the wwPDB website, including PDBx/mmCIF dictionary extensions and over 500 example files. Developers of software packages that produce, access, or visualize PDB data are encouraged to review this information and adapt their software.

Through collaboration with the glycoscience community, software tools were developed to standardize atom nomenclature of nearly 800 monosaccharides in the Chemical Component Dictionary (CCD) and applied branched polymeric representation to oligo- and polysaccharides within the PDB archive, enabling easy translation to other representations commonly used by glycobiologists. To guarantee unambiguous chemical description of oligo-/polysaccharides in each of the nearly 12,000 affected PDB entries, an explicit description of covalent linkage information between their monomeric units is included. To ensure continued Findability of common oligosaccharides (e.g., sucrose, Lewis X factor), the Biologically Interesting molecule Reference Dictionary (BIRD) which will contain the covalent linkage information and common synonyms for such molecules has been expanded.

The organization of chemical synonyms in the CCD will be improved by introducing a new _pdbx_chem_comp_synonyms data category. This will enable more comprehensive capture of alternative names for small molecules in the PDB. To minimize disruption to users, there will be an initial transition period, where the legacy data item, _chem_comp.pdbx_synonyms, will be retained.

OneDep now accepts NMR experimental data as a single file in NMR-STAR or NEF format. This will start the transition away from the current practice of uploading distinct types of NMR data (eg., assigned chemical shifts, restraints, and peak lists) are uploaded separately.

NMR-STAR is the official wwPDB format for storing NMR data, supported by an extensive dictionary [GitHub; Ulrich, E. L. et al. (2019) Journal of Biomolecular NMR, 73: 5–9. doi:10.1007/s10858-018-0220-3], while NEF (NMR exchange format; Gutmanas et al. (2015) Nature Structural & Molecular Biology 22: 433–434 doi:10.1038/nsmb.3041) is a light-weight format and dictionary, supported by the leading software in NMR structure determination. The use of these two interconvertible standard formats as single data files will simplify data deposition, storage, and distribution.

For newly deposited entries accompanied by such a unified data file, NMR data will be distributed in the PDB FTP area as single files in the NMR-STAR format. A best effort conversion to the NEF format will also be provided. These unified NMR data files will be added to a new FTP directory, “nmr_data” in parallel to the existing directories, nmr_restraints and nmr_chemical shifts. In addition, to support existing users these unified files that contain both restraints and chemical shift data will be copied to the existing directories “nmr_restraints” and “nmr_chemical_shifts”.

A standardized naming convention for NMR unified data will also be developed to simplify access of the relevant NMR data. File naming will start with PDB accession code, followed by nmr_data with format type extension, for example ‘2lcb_nmr_data.nef’ or ‘2lcb_nmr_data.str’.