RCSB PDB logo

PDB Citation MeSH Network Explorer: Overview

Overview

In 1971, the structural biology community established the single worldwide archive for macromolecular structure data–the Protein Data Bank (PDB). From its inception, the PDB has embraced a culture of open access, leading to its widespread use by the research community. PDB data are used by hundreds of data resources and millions of users exploring fundamental biology, energy, and biomedicine.

2021 marks the 50th year of the PDB. In January, the PDB contained more than 170,000 structures; ~150,000 had corresponding “primary citations” describing these entries in a peer-reviewed journal.

The National Library of Medicine assigns MeSH (Medical Subject Headings) from a controlled vocabulary to index articles for PubMed. MeSH terms typically appear in a hierarchical tree structure that starts with 16 main branches:

  1. Anatomy
  2. Organisms
  3. Diseases
  4. Chemicals and Drugs
  5. Analytical, Diagnostic and Therapeutic Techniques and Equipment
  6. Psychiatry and Psychology
  7. Phenomena and Processes
  8. Disciplines and Occupations
  9. Anthropology, Education, Sociology and Social Phenomena
  10. Technology, Industry, Agriculture
  11. Humanities
  12. Information Science
  13. Named Groups
  14. Health Care
  15. Publication Characteristics
  16. Geographicals

RCSB PDB has a separate browser to find PDB structures based on these hierarchical trees at RCSB.org.

The RCSB PDB Citation MeSH Network Explorer flattens these trees into co-occurence networks of MeSH terms associated with PDB entries. Each node on the graph is a publication, and nodes are linked when they share MeSH terms.

Publications that share similar MeSH terms are clustered together into Groups; the largest groups are color-coded. Depending on the size of the network, groups contain at least one, two, or three common MeSH terms. Clicking on a node reveals information about the publication, clustered group, and related PDB structures. Nodes that have multiple terms in common are located near each other; nodes that have less in common are located further apart.

This new way of visualizing MeSH terms can provide insights into relationships between PDB primary citations.

Filtering and Data

10,699 publications are represented throughout these networks. MeSH terms very common in PDB-related articles (e.g., protein) were filtered out to generate more meaningful networks. Depending on the size of the branch network, publications with at least one, two, or three common MeSH terms are included to simplify navigation.


Number of publications and corresponding PDB entries by branch

Branch

Publications

PDB Entries

A. Anatomy

393

6,706

B. Organisms

1,725

15,503

C. Diseases

709

9,495

D. Chemicals and Drugs

4,825

22,139

E. Analytical, Diagnostic and Therapeutic Techniques and Equipment

2,204

17,226

F. Psychiatry and Psychology

411

6,911

G. Phenomena and Processes

1,128

12,368

H. Disciplines and Occupations

321

5,883

I. Anthropology, Education, Sociology and Social Phenomena

21,485

485

J. Technology, Industry, Agriculture

78

1,809

K. Humanities

9

215

L. Information Science

52

1,217

M. Named Groups


194

3,878

N. Health Care

114

2,582

Z. Geographicals

120

2,670

The data used in each branch are available for download as CSV files.

This MeSH Explorer has been developed for RCSB PDB by Digital Science and is powered by Dimensions, the world’s largest linked research information dataset.