DATA QUERY, REPORTING AND ACCESS

RCSB PDB BETA SITE FEATURES

In July 2004, the RCSB PDB released a reengineered beta site (pdbbeta.rcsb.org) for public testing. Some of the features of this site are described below.

Comments and suggestions about the beta site are welcomed at betafeedback@rcsb.org.

  • Improved Searching and Visualization for Ligands



    The PDB can be searched for structures containing the same ligand by drawing a ligand in MarvinSketch

    Beta site searches can use common ligand names or the identification codes from the Chemical Component Dictionary (formerly called the HET group dictionary). These queries will search ligand names, some synonyms, and class specifications using the Chemical Component Dictionary created by curation efforts of the RCSB PDB team.

    Ligand name searching supports partial string matches. For example, searching for 'benz' will return all structures that contain benzene as well as those containing benzamidine. For an exact match, the complete name of the ligand must be entered. Ligand searches can also be performed using the three-character ligand ID in the PDB file (the "HET" record). For example, searching for 'HEM' returns all structures that have a heme ligand.

    A recently added search feature is the ability to query for ligands using a SMILES string representation or a 2D structure of the ligand drawn using the MarvinSketch applet. SMILES (Simplified Molecular Input Line Entry Specification) is a comprehensive yet simple nomenclature system for chemicals. A SMILES string represents the valence model of a molecule.

    For example, [Fe+2] or [Fe++] is the SMILES string for iron(II) cation; C1=CC=CC=C1 or c1ccccc1 is the SMILES string for benzene; and Nc1ncnc2n(cnc12)C3OC(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)C(O)C3O is the SMILES string for Adenosine-5'-triphosphate

    Exact, substructure, and similarity searches can be performed. An exact search will retrieve PDB IDs associated with ligands whose structures match the SMILES/2D structure exactly. A substructure search will retrieve all PDB IDs associated with ligands that are superstructures of the query. A similarity search returns PDB IDs associated with ligands whose topology is similar to that of the query.

    Similarity searches are based on finding molecules similar to the query based on a dissimilarity co-efficient, whose value can be set to range from 0 to 1. The default value is 0.3. The dissimilarity coefficient is defined as 1- Tanimoto coefficient. The higher the value, the greater the hits, since ligands that are only remotely similar to the query are also returned. A lower threshold returns fewer hits. It is more stringent in that it returns ligands that have greater similarity to the query structure.

    Both the SMILES string search and the 2D structure search list all the ligands that match the query and the criteria. The matching ligands can be viewed and further explored. A 2D ligand viewer can also be launched from the Structure Explorer page for a single structure.

  • Molecular Viewers



    The beta site features three third-party molecular viewers for interactively visualizing structures: Links to all three viewers are found on each entry'sStructure Explorer page. The viewers require a Java-enabled browser, but not any additional plug-ins or helper installations. However, some applets may require the user to accept a security certificate upon first download (click "Yes" or "Always").

    All three viewers offer rich functionality for visualizing molecules, with many options for selecting, rendering, and coloring portions of or entire PDB structures.

    KiNG, by default, presents a colored ribbon diagram of the structure. The top menu contains many visualization options, along with additional tools, and the ability to save and later reload the currently displayed view. The right hand panel contains a list of check boxes that determine which molecular entities in the structure file are being displayed. This option extends to the models in NMR structures, and is therefore particularly convenient for comparing multiple NMR models in a single PDB file.

    Jmol offers a large number of options for selecting portions of a structure and rendering them in different ways (for example showing a space filled representation of a ligand and a ribbon diagram of the main protein chain). This can be accomplished by right clicking on the applet, and by choosing "Select", "Render", and "Color" from the cascading menu. Other convenient features include the ability to continuously spin the structure, or to visualize the crystal axes or unit cell boundaries.

    WebMol presents a stick model of the molecule with several options for coloring (e.g. by chain or by B-factor). Dotted molecular surfaces can also be displayed. Distance matrix plots or Ramachandran plots can be opened in a separate window, which is interactively linked to the display of the molecule.

    For further help, information about these viewers is provided --KiNG and WebMol help are available from within the applets, Jmol help is available at the Jmol home page).
  •  


    PDBML/XML data uniformity files

    After an extended period of beta testing, the remediated PDB data files from the data uniformity project are now available in PDBML/XML format on the production FTP archive in the following directories:

    The files in the XML directory contain separate XML tags for each item in the atom-site category. XML-extatom files contain the atom records only, in an alternate format with only one pair of XML tags for each atom. XML-noatoms files contain only the metadata for each structure and no atom records. All files are gzipped (.gz compressed). In each case, the data files are in the usual hash directories according to the middle two characters of the PDB ID (e.g. the files for 100d are in a hash directory 00).

    Comments are welcome at info@rcsb.org.

     


    PDB Focus: Redundancy Reduction Cluster Data Available on the PDB FTP Site

    The results of the weekly clustering of protein chains in the PDB are posted at ftp://ftp.rcsb.org/pub/pdb/derived_data/NR/. These clusters are used in the "remove similar sequences" feature on SearchLite and SearchFields on the PDB web sites.

    Files that list the clusters and their rankings at 50%, 70% and 90% sequence identity are available. Smaller rank numbers indicate higher (better) ranking. Chains with rank number 1 are the best representative of their cluster.

    The contents of these files and the details of the clustering and ranking are further described at ftp://ftp.rcsb.org/pub/pdb/derived_data/NR/README and http://www.rcsb.org/pdb/redundancy.html.

     


    WEBSITE STATISTICS

    The PDB is available from several Web and FTP sites located around the world. Users are also invited to preview new features at the RCSB PDB beta test site, accessible at pdbbeta.rcsb.org.

    The access statistics are given or the primary RCSB PDB website at www.pdb.org.