Data Deposition/Biocuration Services and
Archive Management

In the first quarter of 2024, 4,595 experimentally-determined structures were deposited to the PDB archive for a total of 4595 entries deposited in the year.  Data are processed by wwPDB partners RCSB PDB, PDBe, PDBj and PDBc.

Of the structures deposited in 2024 so far, 86.6% were deposited with a release status of hold until publication  8.7% were released as soon as annotation of the entry was complete  and 4.7% were held until a particular date. 56.0% of these entries were determined by X-ray crystallographic methods  1.5% were determined by NMR methods  and 42.3% by 3DEM.

During the same time quarter, 3,599 structures were released in the PDB.

2,278 EMDB maps were released in the archive.

Decorative icon

Sample extended PDB ID

wwPDB anticipates that all the four character PDB accession codes (PDB ID) will be consumed by 2029.

With the continuous growth of PDB archive, wwPDB has revised the PDB accession code format by extending its length and prepending “PDB” (e.g., "1abc" will become "pdb_00001abc"). This process will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files.

Entries with extended PDB IDs (12 characters) will not be compatible with the legacy PDB file format once four-character PDB IDs are consumed. wwPDB encourages scientific journals, PDB community and users to transition to using the PDBx/mmCIF format and the extended PDB ID format as soon as possible.

Resources are available to help PDB users with this transition through the wwPDB resource portal page (Extended PDB ID With 12 Characters). This page links to useful resources for handling this change, including an FAQ on PDB ID extension, materials to learn more about PDBx/mmCIF format, and links to other PDBx/mmCIF resources and software tools. As the transition phase progresses, more training resources will be added to this page.

Additionally, a PDB “beta” archive will be provided during the transition phase in 2026. The directory structure of this “beta” archive will mirror the data organization of the PDB Versioned Archive in the form of https://files-beta.org/pub/pdb/data/entries/two-letter-hash/pdb_accession_code/entry_data_File_names. The two-letter hash will be based on the n-2 and n-3 characters. For example, PDB entry PDB_12345678 will be under /67/. This will maintain consistency with the current PDB archive, where e.g. PDB entry 1abc is under /ab.

Once all the four character PDB accession codes are consumed, this PDB “beta” archive will become the PDB main archive and the current PDB archive will be removed.

Download example files containing extended PDB IDs for software adoption from GitHub.

wwPDB recently announced that PDB three-character Chemical Component IDs have been consumed. Five-character alphanumeric accession codes for CCD IDs are now issued by the OneDep system.

For any further information please contact us at info@wwpdb.org.

Decorative icon

New archive snapshots are available.

A snapshot of the PDB Core archive (ftp://ftp.wwpdb.org, https://s3.rcsb.org) as of January 2, 2024 has been added to ftp://snapshots.wwpdb.org, https://s3snapshots.rcsb.org (AWS), and ftp://snapshots.pdbj.org. Snapshots have been archived annually since 2005 to provide readily identifiable data sets for research on the PDB archive.

The directory 20240101 includes the 214,121 experimentally-determined structure and experimental data available at that time. Atomic coordinate and related metadata are available in PDBx/mmCIF, PDB, and XML file formats. The date and time stamp of each file indicates the last time the file was modified. The snapshot of PDB Core Archive is 1,242 GB.

A snapshot of the EMDB Core archive (ftp://ftp.ebi.ac.uk/pub/databases/emdb/) as of January 01, 2024 can be found in ftp://ftp.ebi.ac.uk/pub/databases/emdb_vault/20240101/ and ftp://snapshots.pdbj.org/20240101/. The snapshot of EMDB Core Archive contains map files and their metadata within XML files for both released and obsoleted entries (32,033 and 282, respectively) and is 14 TB in size.

 

A new paper describes how the recently-announced NextGen Archive provides centralized access to integrated annotations and enriched structural information for PDB data:

NextGen Archive: Centralising Access to Integrated Annotations and Enriched Structural Information by the Worldwide Protein Data Bank
Preeti Choudhary, Zukang Feng, John Berrisford, Henry Chao, Yasuyo Ikegawa, Ezra Peisach, Dennis W. Piehl, James Smith, Ahsan Tanweer, Mihaly Varadi, John D. Westbrook, Jasmine Y. Young, Ardan Patwardhan, Kyle L. Morris, Jeffrey C. Hoch, Genji Kurisu, Sameer Velankar, Stephen K. Burley
(2023) bioRxiv doi: 10.1101/2023.10.24.563739

The PDB NextGen archive provides sequence annotation from external resources such as UniProt, SCOP2 and Pfam in addition to the content provided in the structure model files in the PDB main archive. The inclusion of UniProtKB numbering facilitates effortless structural comparisons between experimental and predicted protein models. These PDBx/mmCIF files are directly compatible with various data visualization tools, simplifying the display of annotations on 3D structure views.

Community recommendations on cryoEM data archiving and validation
Gerard J. Kleywegt, Paul D. Adams, Sarah J. Butcher, Cathy Lawson, Alexis Rohou, Peter B. Rosenthal, Sriram Subramaniam, Maya Topf, Sanja Abbott, Philip R. Baldwin, John M. Berrisford, Gérard Bricogne, Preeti Choudhary, Tristan I. Croll, Radostin Danev, Sai J. Ganesan, Timothy Grant, Aleksandras Gutmanas, Richard Henderson, J. Bernard Heymann, Juha T. Huiskonen, Andrei Istrate, Takayuki Kato, Gabriel C. Lander, Shee-Mei Lok, Steven J. Ludtke, Garib N. Murshudov, Ryan Pye, Grigore D. Pintilie, Jane S. Richardson, Carsten Sachse, Osman Salih, Sjors H.W. Scheres, Gunnar F. Schroeder, Carlos Oscar S. Sorzano, Scott M. Stagg, Zhe Wang, Rangana Warshamanage, John D. Westbrook, Martyn D. Winn, Jasmine Y. Young, Stephen K. Burley, Jeffrey C. Hoch, Genji Kurisu, Kyle Morris, Ardan Patwardhan, Sameer Velankar
(2024) IUCrJ 11: 140–151 https://doi.org/10.1107/S2052252524001246

EMDB—the Electron Microscopy Data Bank
The wwPDB Consortium
Nucleic Acids Research (2024) 52: D456–D465 https://doi.org/10.1093/nar/gkad1019

Restraint Validation of Biomolecular Structures Determined by NMR in the Protein Data Bank
Kumaran Baskaran, Eliza Ploskon, Roberto Tejero, Masashi Yokochi, Deborah Harrus, Yuhe Liang, Ezra Peisach, Irina Persikova, Theresa A Ramelot, Monica Sekharan, James Tolchard, John D Westbrook, Benjamin Bardiaux, Charles Schwieters, Ardan Patwardhan, Sameer Velankar, Stephen K Burley, Genji Kurisu, Jeffrey C Hoch, Gaetano T Montelione, Geerten W Vuister, Jasmine Y Young
(2024) Structure https://doi.org/10.1016/j.str.2024.02.011 

Photo of Chairman of the Protein Research Foundation, Prof. Toshiharu Hase, and Dr. Minyu Chen

Chairman of the Protein Research Foundation, Prof. Toshiharu Hase, and Dr. Minyu Chen

Congratulations to PDBj biocurator Minyu Chen on processing over 10,000 PDB depositions. She is the second biocurator to reach this milestone in the PDBj and the fifth in the wwPDB. Yumiko Kengaku reached this milestone in April 2021.