Data Deposition/Biocuration Services and Archive Management

In the first quarter of 2021, 3728 experimentally-determined structures were deposited to the archive.  Data are processed by wwPDB partners RCSB PDB, PDBe, and PDBj.

Of all structures deposited, 84.1% were deposited with a release status of hold until publication; 8.7% were released as soon as annotation of the entry was complete; and X7.2% were held until a particular date. 71.6% of these entries were determined by X-ray crystallographic methods; 3.0% were determined by NMR methods; and 25.4% by 3DEM.

During the same period, 3264 structures were released, including 329 SARS-CoV-2 structures.  1114 EMDB maps were released in the archive.

Reports for every released set of EM model coordinates in the PDB and every released EMDB map entry are now available to provide quantitative and visual assessments of structure quality and enable archive-wide comparisons.

Examples of recent improvements include images for deposited masks, improved map-model overlay images, visualization of a (approximate) raw map from two half-maps, and rotationally averaged power spectrum plots. The underlying methodology is continually improved, based on community requirements, requests and feedback.

Visit wwPDB.org for details.

Example of map-model overlay image

Example of map-model overlay image: EMD-30388/7CWU, SARS-CoV-2 spike proteins trimer in complex with P17 and FC05 Fabs cocktail.

One of the important processes in wwPDB curation of PDB entries is the definition of assemblies for each structure. This helps users of PDB data to understand the structure in the context of its complex formation in the specific experimental conditions.

To ensure that assemblies are curated correctly, they are reviewed by annotators at the time of curation before being reported back to the depositors after the curation process.

The deposition system in OneDep has now been enhanced so that after curation, the annotated assembly is displayed in the Mol* 3D viewer for depositors to review. This viewer is available in a new Review section in the deposition interface, which is present after curation of the entry. The Mol* viewer can display PDB structure data within the browser with minimal memory requirements, therefore making it quick and easy to visually display assembly information.

These changes will help improve the validation and reporting of curated assemblies during the deposition process.

The assembly review page, as displayed for depositors after curation of the entry. The curated assembly is displayed in the Mol* 3D viewer, within the browser.

A new article in Structure describes new features, including branched representations and 2D SNFG images for carbohydrates, identification of ligands of interest, 3D views of electron density fit, and 2D images of small molecule geometry.

These enhancements and processes for validation of 3D small-molecular structures reflect recommendations from the wwPDB/CCDC/D3R Ligand Validation Workshop and the adoption of software through community collaborations.

This manuscript also highlights enhancements made since the initial implementation of Validation Reports as described in Validation of the Structures in the Protein Data Bank (2017) Structure 25: 1916-1927 doi: 10.1016/j.str.2017.10.009.

Enhanced Validation of Small-Molecule Ligands and Carbohydrates in the Protein Data Bank
(2021) Structure doi: 10.1016/j.str.2021.02.004

In 2014, PDBx/mmCIF became the PDB’s archive format and the the legacy PDB file format was frozen. In addition to PDBx/mmCIF files for all entries, wwPDB produces PDB format-formatted files for entries that can be represented in this legacy file format (e.g., entries with over 99,999 atoms or with multi-character chain IDs are only available in PDBx/mmCIF)

As the size and complexity of PDB structures increases, additional limitations of the legacy PDB format are becoming apparent and need to be addressed.

Defining complex sheet records

Restrictions in the SHEET record fields in legacy the PDB file format do not allow for the generation of complex beta sheet topology. Complex beta sheet topologies include instances where beta strands are part of multiple beta sheets and other cases where the definition of the strands within a beta sheet cannot be presented in a linear description. For example, in PDB entry 5wln a large beta barrel structure is created from multiple copies of a single protein; within the beta sheet forming the barrel are instances of a single beta strand making contacts on one side with multiple other strands, even from different chains.

This limitation, however, is not an issue in the PDBx/mmCIF formatted file, where these complex beta sheet topology can be captured in _struct_sheet, _struct_sheet_order, _struct_sheet_range, and _struct_sheet_hbond.

Starting June 8th 2021, legacy PDB format files will no longer be generated for PDB entries where the SHEET topology cannot be generated. For these structures, wwPDB will continue to provide secondary structure information with helix and sheet information in the PDBx/mmCIF formatted file.

Deprecation of _struct_site (SITE) records

wwPDB regularly reviews the software used during OneDep biocuration. The _struct_site and _struct_site_gen categories in PDBx/mmCIF (SITE records in the legacy PDB file format) are generated by in-house software and based purely upon distance calculations, and therefore may not reflect biological functional sites.

Starting in June 2021, the in-house legacy software which produces _struct_site and _struct_site_gen records will be retired and wwPDB will no longer generate these categories for newly-deposited PDB entries. Existing entries will be unaffected.