PDB-news-logo
Published quarterly by the Research Collaboratory
for Structural Bioinformatics Protein Data Bank
Winter 2013
Number 56

NEWSLETTER

Data Query, Reporting and Access

2012 Website Statistics

Access statistics for the second quarter of 2012 are shown.

Month
Unique Visitors
Visits
Bandwidth
January
242,584
580,079
749.94 GB
February
258,865
628,875
877.04 GB
March
255,442
613,853
825.07 GB
April
258,130
617,748
778.90 GB
May
244,865
615,509
794.33 GB
June
206,864
525,815
624.38 GB
July
184,117
478,089
697.83 GB
August
175,961
439,848
484.91 GB
September
241,088
572,574
755.14 GB
October
316,640
723,577
970.88 GB
November
309,021
711,058
1,513.52 GB
December
265,140
599,337
1,179.91 GB


Improved Drug Searches and Annotations

Drug and drug target data from DrugBank (www.drugbank.ca)1 have been integrated with the RCSB PDB website.

The simple top bar search can find generic and brand names of drugs. For example, type Zoloft into the simple top bar search to find the related Ligand Summary page.

The Ligand Summary page links to entries containing Zoloft and provides summary information about the drug. The Drug Info widget on this page lists and links to corresponding data from DrugBank (when available), including DrugBank ID, drug name, groups, brand name, descriptions, and more.



The Latest Structures widget, located on the home page, cycles through newly released entries.

Users can toggle through different views of search results

Explore Different Views of Search Results

In addition to the original query results browser, now called the detailed view, results can be browsed using condensed, timeline or gallery views. The views are synchronized; selecting or deselecting a structure in one view will have the same effect in the others.

The default view was designed to provide detailed information about each entry: a thumbnail image, release date, classification, compound, experimental type, citation, and links related Molecule of the Month articles, and more.

The condensed view concisely lists structures, showing only the PDB ID, title, and macromolecule name found in their results.

The gallery view shows only the structure image and PDB ID. Images can be resized by using the options dropdown. The PDB ID links to the Structure Summary page for the entry.

The timeline view takes the current search results and generates a visual timeline ordered by release date, similar to PDB-101's Author Profile feature.

These different views were designed to support the interests of different users.


Map PDB Structures to Full-length Protein Sequences

Protein Feature View from the Structure Summary page for 2vx32

The new Protein Feature View visually summarizes how a full-length protein sequence from UniProtKB corresponds to PDB entries. It also loads annotations from external databases (such as Pfam) and homology models from the Protein Model Portal. Annotations visualizing predicted regions of protein disorder (computed with JRONN) and hydrophobic regions (as computed using a sliding window approach) are also displayed.

For individual entries, the Protein Feature View is available from the Molecular Description Widget on Structure Summary pages. The example shown for PDB ID 2vx3 illustrates how the ranges of a protein that have been observed in an experiment (in blue) correspond to the full length UniProtKB sequence (in grey). The secondary structure information from the PDB entry is also shown (helices in red, beta strands in yellow).

Various features that are known for the UniProtKB sequence are displayed in green as they correlate with regions in the PDB entry. Active sites (from UniProtKB and the PDB entry) are also annotated. Moving the mouse cursor over the "lollipops" displays the residue label. Mousing over the images shown in the "Secstruc" row reveals secondary structure from the PDB entry.

This view can be expanded to map all PDB entries related to a single UniProtKB sequence by selecting the Protein Feature View link shown in this widget. By default, a few representative PDB entries are used to give an overview for which regions of the UniProtKB sequence PDB entries are available. Selecting the plus sign or the "Show All" button will expand the view to show all related PDB chains, which can then be sorted by resolution, length, and release date. These Protein View images can be exported as Scalable Vector Graphics (SVG) files.

The PDB to UniProtKB mapping is based on the data provided by the Structure integration with function, taxonomy and sequence (SIFTS) initiative.


MyPDB: Keep up-to-date with new structures... automatically!

MyPDB will email alerts when structures matching saved queries are released.

Create a personalized version of the RCSB PDB website using these MyPDB features:
  • Sign up for regular email alerts. Searches saved in MyPDB can be set to run with each update. Email alerts (weekly or monthly) will be sent when new entries matching the search are released in the PDB archive.
  • Save searches online. The MyPDB Saved Query Manager stores any type of RCSB PDB structure search: a particular keyword, sequence, ligand, Advanced Search composite query, and more. Run these searches at any time with the click of a button.
  • Store personal notes and bookmark structures. Save personal annotations and notes on the Structure Summary tab of any PDB entry. Want to keep a running list of interesting structures? MyPDB can keep a favorites list of PDB entries. The Personal Annotations summary page provides easy access to all of these tagged structures and annotations.
All MyPDB account information is kept private and secure, and can be updated at any time.


Advanced Search: Multiple ID Search

Have a list of PDB IDs you'd like to learn more about? Use Advanced Search to view all of the structures in the Query Results Browser.

From the Advanced Search pull-down menu, select ID(s) and Keywords>PDB ID(s). Enter the list of multiple IDs, which can be separated by commas or white space, including line breaks. The structures can then be explored using the different query results browser views: detailed, condensed, timeline or gallery.

Users can also link directly to multiple structures in the Query Results Browser using the syntax:
http://www.rcsb.org/pdb/search/smart.do?smartSearchSubtype_1=StructureIdQuery&structureIdList_1=1MI6+1MVR

This example will launch entries 1mi6 and 1mvr in the Query Results Browser; other IDs can be added to the URL:
http://www.rcsb.org/pdb/search/smart.do?smartSearchSubtype_1=StructureIdQuery&structureIdList_1=1MI6+1MVR+4GMK+4GSB


Advanced Search: Sequence Motif

Advanced Search lets users build queries of specific types of data. Users can query for an exact sequence or for a sequence pattern using regular expression syntax. To look for structures with a particular Sequence Motif, try using one of these techniques with the Sequence Features> Sequence Motif option.

Examples of zinc fingers from the Molecule of the Month.

  • Short Sequence Fragments. The sequence motif search, unlike BLAST or FASTA, can search for short sequence fragments of any size, such as NPPTP
  • Wildcard Searches. Use an 'X' in the sequence for wildcard searching. For example, XPPXP can be entered to look for SH3 domains using the consequence sequence -X-P-P-X-P (where X is a variable residue and P is proline)
  • Multiples of Variable Residues. The {n} notation can be used, where n is the number of variable residues. To query a motif with 7 variables between residues W and G, and 20 variable residues between G and L, try WX{7}GX{20}L
  • Ranges of Variable Residues. The {n,m} notation can be used to indicate ranges of variable residues, where n is the minimum and m the maximum number of repetitions. For example, the zinc finger motif that binds Zn in a DNA-binding domain can be expressed as: CX{2,4}CX{12}HX{3,5}H
  • Motifs at the Beginning of a Sequence. The '^' operator searches for sequence motifs at the beginning of a protein sequence. Two ways of looking for sequences with N-terminal histidine tags are: ^HHHHHH and ^H{6}
  • Alternative Residues. Square brackets specify alternative residues at a particular position. To search for a Walker (P loop) motif that binds ATP or GTP, try: [AG]XXXXGK[ST] The search will look for sequences with A or G, followed by 4 variable residues, then G K, and finally S or T.


References

  1. C. Knox, V. Law, T. Jewison, P. Liu, S. Ly, A. Frolkis, A. Pon, K. Banco, C. Mak, V. Neveu, Y. Djoumbou, R. Eisner, A. C. Guo, D. S. Wishart. (2011) DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic Acids Res. 39: D1035-1041.