REMEDIATION PROJECT DETAILS

Accessing the PDB Archive

The PDB archive has been remediated by wwPDB members the RCSB PDB, MSD-EBI, PDBj, and the BMRB. It can be accessed from ftp://ftp.wwpdb.org.

New files processed and released into the archive by the wwPDB sites will reflect the new features incorporated as part of this project, including standardized IUPAC1 nomenclature for all chemical components.

Users may have to download new software to properly view the files with the new nomenclature (e.g., RasMol, Chimera)2. Links to resources are available at www.wwpdb.org.

A snapshot of the unremediated PDB archive (as of July 31, 2007) is available at ftp://snapshots.rcsb.org.


Remediation of the Entire PDB Archive

Highlights of the types of information improved through remediation include:


Sequence  

 Updated references to databases and taxonomies
Resolved differences between chemical and
macromolecular sequences

  Citation

  Verified and updated primary citation assignments

Assembly and Virus Information


Improved representation of deposited and experimental coordinate frames, symmetry, and frame transformations


Nucleic Acid Labeling


Deoxy and ribose nucleotides assigned seperate chemical definitions


Beamline Data


Beamline and synchrotron facilty names have been made consistent with BioSync


Chemical Components


Standardization of chemistry and nomenclature in monomers and ligands



Remediated data are available for each PDB entry in three formats:

  • mmCIF (mmcif.pdb.org). All remediation work was done using the PDB Exchange Dictionary (PDBx) that follows the mmCIF syntax.

  • PDBML-XML (pdbml.pdb.org). Remediated data files are also available in PDBML-XML format, in a direct translation from the files in mmCIF format.

  • PDB File Format (wwpdb.org). The remediated files have been released in PDB File Format version 3.0. This version of the file format incorporates standardized atom nomenclature, and distinguishes deoxyribonucleic acid from ribonucleic acid.




The Chemical Component Dictionary


Image of 407d3 created using the remediated data file and the latest patch to OpenRasmol (2.7.3.1)

The Chemical Component Dictionary (formerly known as the "HET dictionary") describes all residues in the PDB, standard and non-standard, and all small molecules. It has been remediated to address the inconsistencies in older dictionary entries that resulted in valence problems, missing model coordinates, redundant ligands, and more.

The features of this dictionary include:

  • Standard nomenclature

  • Model coordinates have been corrected, redundant chemical components obsoleted, and additional definitions for protonated forms are provided.

  • Stereochemical assignments, aromatic bond assignments, idealized coordinates, chemical descriptors (SMILES & InChI)4, and systematic chemical names have been added.

The full Chemical Component Dictionary and the companion Amino Acid Variants Dictionary can be downloaded from remediation.wwpdb.org/ downloads.html.

Users can also search for individual chemical components, either by entering the component ID in the form provided, or by browsing by ID. The variant dictionary can also be browsed.

For each chemical component in the dictionary, a summary page provides a 2D chemical diagram and 3D graphic (using Jmol) of the ligand. This page also describes the ligand's physical and chemical features of the ligand. Status information along with links to the component definition in CIF and PDBML/XML formats, model coordinates, idealized coordinates, and chemical diagrams are provided.






Accessing the Remediated Data from the RCSB PDB Website

The latest release of the RCSB PDB website utilizes the data from the wwPDB Remediation Project.

This new site offers:

  • Improved searching and reporting capabilities

  • Updated sequence references

  • Updated primary citation information and links

  • Better representations for complex assemblies (such as viruses)

  • Access to remediation data and pre-remediation data

  • Advanced access to ligand information

  • Enhanced sequence details page for each structure


1 Pure & Applied Chem., 70, 117-142, 1998

2 see sourceforge.net/projects/openrasmol and www.cgl.ucsf.edu/chimera

3 407d: C.L. Kielkopf, S. White, J.W Szewczyk, J.M. Turner, E.E. Baird, P.B. Dervan, D.C. Rees (1998) A structural basis for recognition of A-T and T-A base pairs in the minor groove of B-DNA. Science 282:111-115

4 D. Weininger (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31 - 36 and © The International Union of Pure and Applied Chemistry (2005) IUPAC International Chemical Identifier (InChI) (contact: secretariat@iupac.org)