Published quarterly by the Research Collaboratory
for Structural Bioinformatics Protein Data Bank

Spring 2010
Number 45

 
NEWSLETTER
  Contents  
Home | Newsletter Archive | PDF Version
Message from the RCSB PDB
Data Deposition
Deposition Statistics
wwPDB News: Changes to the wwPDB Policy for Depositing Polypeptide Structures
wwPDB News: Initial Release REVDAT Dates
Data Query, Reporting and Access
Website Statistics
Improved Ligand Searching
Comparison Tool for Exploring Sequence and Structure Alignments
Advanced Search: Sequence Motif
Time-stamped Copies of PDB Archive Available via FTP
Outreach and Education
Online Narrated Tutorial Demonstrates How to Use the RCSB PDB
Poster Download: How Do Drugs Work?
NJ Science Olympiad Protein Modeling Results
Recent and Upcoming Meetings and Events
Education Corner
Robert C. Bateman, Jr. and Paul A. Craig: A Proficiency Rubric for Biomacromolecular 3D Literacy
RCSB PDB PARTNERS, MANAGEMENT, AND STATEMENT OF SUPPORT
 
     

DATA QUERY, REPORTING AND ACCESS

Website Statistics

Website access statistics for the first quarter of 2010 are given below.

Month
Unique Visitorss
Visits
Bandwidth
January 2010
176655
422065
755.87 GB
February 2010
184306
434442
783.75 GB
March 2010
210308
510434
1189.17 GB


Improved Ligand Searching

Searching for ligands using the RCSB PDB's Advanced Search, Chemical Structure Search, or top-bar pulldown search has been improved.

The Chemical Name search from the top-bar pulldown menu on every page returns matches to names of small molecules in the wwPDB Chemical Component Dictionary and any synonyms. Searches by Chemical Name or Chemical ID will return structures with matching components that are free ligands or are part of a protein or nucleic acid chain.

In the Advanced Search, searches can be customized to look for free and/or polymeric chemical components. A "sounds like" option searches for misspelled or incomplete chemical component names. Advanced Searches using SMILES strings use a similarity (instead of dissimilarity) threshold while specifying polymeric type.

The Chemical Structure Search (available from the left hand Search menu under Chemical Components) utilizes the latest version of the MarvinSketch1 applet (5.3.0.1).

Users can perform exact, similar, or substructure search by drawing components or loading and editing SMILES strings or chemical IDs into the tool.

Screencasts are available to help explore these features at www.rcsb.org/pdb/static.do?p=general_information/screencasts.jsp


Comparison Tool for Exploring Sequence and Structure Alignments

The Comparison Tool calculates pairwise sequence and structure alignments using different methods. This feature is available on the Compare Structures web page (under Tools in the left menu) and as a downloadable web widget.

The current sequence alignments possible are blast2seq,2 Needleman-Wunsch,3 and Smith-Waterman.4 Structure alignments can be performed using FATCAT5 and CE6 through a Java applet launched from the RCSB PDB site. Mammoth,7 TM-Align,8 and TopMatch9 structure alignments will be launched at their related external sites.

This functionality is also integrated with the Sequence Clusters offered from each entry's Sequence Similarity tab. Users can select a pair of chains from a given sequence cluster, and then run either sequence or structure alignments. For example, the Sequence Similarity tab for entry 4hhb offers sequence clusters at different similarity cutoffs. Users can select a pair of chains from a given sequence cluster (95% for this example), and then run the sequence or structure alignments available from the Comparison Tool.


Advanced Search: Sequence Motif

Advanced Search lets users build queries of specific types of data. To look for structures with a particular Sequence Motif, try using one of these techniques with the Sequence Features> Sequence Motif option. Users can query for an exact sequence or for a sequence pattern using regular expression syntax, as shown below. Regular expressions are powerful notations for defining complex sequence patterns.

  • Short Sequence Fragments
    The sequence motif search, unlike BLAST or FASTA, can search for short sequence fragments of any size, such as NPPTP

  • Wildcard Searches
    Use an 'X' in the sequence for wildcard searching. For example, XPPXP can be entered to look for SH3 domains using the consequence sequence -X-P-P-X-P (where X is a variable residue and P is proline)

  • Multiples of Variable Residues
    The {n} notation can be used, where n is the number of variable residues. To query a motif with 7 variables between residues W and G, and 20 variable residues between G and L, try WX{7}GX{20}L

  • Ranges of Variable Residues
    The {n,m} notation can be used to indicate ranges of variable residues, where n is the minimum and m the maximum number of repetitions. For example, the zinc finger motif that binds Zn in a DNA-binding domain can be expressed as: CX{2,4}CX{12}HX{3,5}H

  • Motifs at the Beginning of a Sequence
    The '^' operator searches for sequence motifs at the beginning of a protein sequence. Two ways of looking for sequences with N-terminal histidine tags are: ^HHHHHH and ^H{6}

  • Alternative Residues
    Square brackets specify alternative residues at a particular position. To search for a Walker (P loop) motif that binds ATP or GTP, try: [AG]XXXXGK[ST]

The search will look for sequences with A or G, followed by 4 variable residues, then G K, and finally S or T.


Time-stamped Copies of PDB Archive Available via FTP

A snapshot of the PDB archive (ftp.wwpdb.org) as of January 4, 2010 has been added to ftp://snapshots.wwpdb.org/. Snapshots of the PDB have been archived annually since 2004. It is hoped that these snapshots will provide readily identifiable data sets for research on the PDB archive.

The directory 20100104 includes the 62,388 experimentally-determined coordinate files and related experimental data that were available at that time. Coordinate data are available in PDB, mmCIF, and XML formats. The date and time stamp of each file indicate the last time the file was modified.

The script at ftp://snapshots.wwpdb.org/rsyncSnapshots.sh may be used to make a local copy of a snapshot or sections of the snapshot. Users can perform exact, similar, or substructure search by drawing components or loading and editing SMILES strings or chemical IDs into the tool.


  1. ChemAxon Ltd. MarvinSketch. http://www.chemaxon.com/marvin/release-notes.html.
  2. T. A. Tatusova & T. L. Madden. (1999) BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett 174, 247250.
  3. S. B. Needleman & C. D. Wunsch. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48, 443-453.
  4. T. F. Smith & M. S. Waterman. (1981) Identification of common molecular subsequences. J Mol Biol 147, 195-197.
  5. Y. Ye & A. Godzik. (2003) Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19, ii246-255.
  6. I. N. Shindyalov & P. E. Bourne. (1998) Protein structure alignment by incremental combinatory extension of the optimum path. Protein Eng 11, 739-747.
  7. A. R. Ortiz, C. E. Strauss & O. Olmea. (2002) MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 11, 2606-2621.
  8. Y. Zhang & J. Skolnick. (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33, 2302-2309.
  9. M. J. Sippl & M. Wiederstein. (2008) A note on difficult structure alignment problems. Bioinformatics 24, 426-427.
 
  Participating RCSB Members: Rutgers • SDSC/SKAGGS/UCSD
E-mail: info@rcsb.org • Web: www.pdb.org • FTP: ftp.wwpdb.org
The RCSB PDB is a member of the wwPDB (www.wwpdb.org)
©2009 RCSB PDB  
 
RCSB PDB RCSB PDB