In 1971, the structural biology community established the single worldwide archive for macromolecular structure data–the Protein Data Bank (PDB). From its inception, the PDB has embraced a culture of open access, leading to its widespread use by the research community. PDB data are used by hundreds of data resources and millions of users exploring fundamental biology, energy, and biomedicine.
2021 marks the 50th year of the PDB. In January, the PDB contained more than 170,000 structures; ~150,000 had corresponding “primary citations” describing these entries in a peer-reviewed journal.
The National Library of Medicine assigns MeSH (Medical Subject Headings) from a controlled vocabulary to index articles for PubMed. MeSH terms typically appear in a hierarchical tree structure that starts with 16 main branches:
RCSB PDB has a separate browser to find PDB structures based on these hierarchical trees at RCSB.org.
The RCSB PDB Citation MeSH Network Explorer flattens these trees into co-occurence networks of MeSH terms associated with PDB entries. Each node on the graph is a publication, and nodes are linked when they share MeSH terms.
Publications that share similar MeSH terms are clustered together into Groups; the largest groups are color-coded. Depending on the size of the network, groups contain at least one, two, or three common MeSH terms. Clicking on a node reveals information about the publication, clustered group, and related PDB structures. Nodes that have multiple terms in common are located near each other; nodes that have less in common are located further apart.
This new way of visualizing MeSH terms can provide insights into relationships between PDB primary citations.
10,699 publications are represented throughout these networks. MeSH terms very common in PDB-related articles (e.g., protein) were filtered out to generate more meaningful networks. Depending on the size of the branch network, publications with at least one, two, or three common MeSH terms are included to simplify navigation.
Number of publications and corresponding PDB entries by branch
Branch |
Publications |
PDB Entries |
---|---|---|
A. Anatomy |
393 |
6,706 |
B. Organisms |
1,725 |
15,503 |
C. Diseases |
709 |
9,495 |
D. Chemicals and Drugs |
4,825 |
22,139 |
E. Analytical, Diagnostic and Therapeutic Techniques and Equipment |
2,204 |
17,226 |
F. Psychiatry and Psychology |
411 |
6,911 |
G. Phenomena and Processes |
1,128 |
12,368 |
H. Disciplines and Occupations |
321 |
5,883 |
I. Anthropology, Education, Sociology and Social Phenomena |
21,485 |
485 |
J. Technology, Industry, Agriculture |
78 |
1,809 |
K. Humanities |
9 |
215 |
L. Information Science |
52 |
1,217 |
M. Named Groups |
194 |
3,878 |
N. Health Care |
114 |
2,582 |
Z. Geographicals |
120 |
2,670 |
The data used in each branch are available for download as CSV files.
This MeSH Explorer has been developed for RCSB PDB by Digital Science and is powered by Dimensions, the world’s largest linked research information dataset.