RCSB PDB Newsletter | Winter 2010

Month	Unique Visitorss	Visits	Bandwidth
January 2010	176655	422065	755.87 GB
February 2010	184306	434442	783.75 GB
March 2010	210308	510434	1189.17 GB

Improved Ligand Searching

Searching for ligands using the RCSB PDB's Advanced Search, Chemical Structure Search, or top-bar pulldown search has been improved.

The Chemical Name search from the top-bar pulldown menu on every page returns matches to names of small molecules in the wwPDB Chemical Component Dictionary and any synonyms. Searches by Chemical Name or Chemical ID will return structures with matching components that are free ligands or are part of a protein or nucleic acid chain.

In the Advanced Search, searches can be customized to look for free and/or polymeric chemical components. A "sounds like" option searches for misspelled or incomplete chemical component names. Advanced Searches using SMILES strings use a similarity (instead of dissimilarity) threshold while specifying polymeric type.

The Chemical Structure Search (available from the left hand Search menu under Chemical Components) utilizes the latest version of the MarvinSketch¹ applet (5.3.0.1).

Users can perform exact, similar, or substructure search by drawing components or loading and editing SMILES strings or chemical IDs into the tool.

Screencasts are available to help explore these features at www.rcsb.org/pdb/static.do?p=general_information/screencasts.jsp

Comparison Tool for Exploring Sequence and Structure Alignments

The Comparison Tool calculates pairwise sequence and structure alignments using different methods. This feature is available on the Compare Structures web page (under Tools in the left menu) and as a downloadable web widget.

The current sequence alignments possible are blast2seq,² Needleman-Wunsch,³ and Smith-Waterman.⁴ Structure alignments can be performed using FATCAT⁵ and CE⁶ through a Java applet launched from the RCSB PDB site. Mammoth,⁷ TM-Align,⁸ and TopMatch⁹ structure alignments will be launched at their related external sites.

This functionality is also integrated with the Sequence Clusters offered from each entry's Sequence Similarity tab. Users can select a pair of chains from a given sequence cluster, and then run either sequence or structure alignments. For example, the Sequence Similarity tab for entry 4hhb offers sequence clusters at different similarity cutoffs. Users can select a pair of chains from a given sequence cluster (95% for this example), and then run the sequence or structure alignments available from the Comparison Tool.

Advanced Search: Sequence Motif

Advanced Search lets users build queries of specific types of data. To look for structures with a particular Sequence Motif, try using one of these techniques with the Sequence Features> Sequence Motif option. Users can query for an exact sequence or for a sequence pattern using regular expression syntax, as shown below. Regular expressions are powerful notations for defining complex sequence patterns.

Short Sequence Fragments
The sequence motif search, unlike BLAST or FASTA, can search for short sequence fragments of any size, such as NPPTP
Wildcard Searches
Use an 'X' in the sequence for wildcard searching. For example, XPPXP can be entered to look for SH3 domains using the consequence sequence -X-P-P-X-P (where X is a variable residue and P is proline)
Multiples of Variable Residues
The {n} notation can be used, where n is the number of variable residues. To query a motif with 7 variables between residues W and G, and 20 variable residues between G and L, try WX{7}GX{20}L
Ranges of Variable Residues
The {n,m} notation can be used to indicate ranges of variable residues, where n is the minimum and m the maximum number of repetitions. For example, the zinc finger motif that binds Zn in a DNA-binding domain can be expressed as: CX{2,4}CX{12}HX{3,5}H
Motifs at the Beginning of a Sequence
The '^' operator searches for sequence motifs at the beginning of a protein sequence. Two ways of looking for sequences with N-terminal histidine tags are: ^HHHHHH and ^H{6}
Alternative Residues
Square brackets specify alternative residues at a particular position. To search for a Walker (P loop) motif that binds ATP or GTP, try: [AG]XXXXGK[ST]

The search will look for sequences with A or G, followed by 4 variable residues, then G K, and finally S or T.

Time-stamped Copies of PDB Archive Available via FTP

A snapshot of the PDB archive (ftp.wwpdb.org) as of January 4, 2010 has been added to ftp://snapshots.wwpdb.org/. Snapshots of the PDB have been archived annually since 2004. It is hoped that these snapshots will provide readily identifiable data sets for research on the PDB archive.

The directory 20100104 includes the 62,388 experimentally-determined coordinate files and related experimental data that were available at that time. Coordinate data are available in PDB, mmCIF, and XML formats. The date and time stamp of each file indicate the last time the file was modified.

The script at ftp://snapshots.wwpdb.org/rsyncSnapshots.sh may be used to make a local copy of a snapshot or sections of the snapshot. Users can perform exact, similar, or substructure search by drawing components or loading and editing SMILES strings or chemical IDs into the tool.

ChemAxon Ltd. MarvinSketch. http://www.chemaxon.com/marvin/release-notes.html.
T. A. Tatusova & T. L. Madden. (1999) BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett 174, 247250.
S. B. Needleman & C. D. Wunsch. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48, 443-453.
T. F. Smith & M. S. Waterman. (1981) Identification of common molecular subsequences. J Mol Biol 147, 195-197.
Y. Ye & A. Godzik. (2003) Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19, ii246-255.
I. N. Shindyalov & P. E. Bourne. (1998) Protein structure alignment by incremental combinatory extension of the optimum path. Protein Eng 11, 739-747.
A. R. Ortiz, C. E. Strauss & O. Olmea. (2002) MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 11, 2606-2621.
Y. Zhang & J. Skolnick. (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33, 2302-2309.
M. J. Sippl & M. Wiederstein. (2008) A note on difficult structure alignment problems. Bioinformatics 24, 426-427.

Participating RCSB Members: Rutgers • SDSC/SKAGGS/UCSD
E-mail: info@rcsb.org • Web: www.pdb.org • FTP: ftp.wwpdb.org
The RCSB PDB is a member of the wwPDB (www.wwpdb.org)