Dr. Philip E. Bourne, RCSB Protein Data Bank
Philip E. Bourne is a Professor in the Department of Pharmacology at the
University of California, San Diego, co-director of the RCSB PDB, and an
Adjunct Professor at the Burnham Institute and the Keck Graduate Institute.
He received his Ph.D. in chemistry from the Flinders University of South Australia
in 1980 where he studied the structural and electrophilic effects of substitution
on fully saturated caged hydrocarbon molecules. While a post-doctoral fellow at
Sheffield University UK he contributed to the understanding of the structural
role of ferritin in iron storage. Later as a Senior Research Scientist at Columbia
University, he proposed mechanisms for the role of caracurines and snake toxins that
operate postsynaptically. During the 80's, first as the Director of the Cancer Center
Computer Facility, and later Director of the Medical School Computer Facility at Columbia,
he helped establish a tumor registry and various applications and databases in support
of patient care. As a Senior Associate of the Howard Hughes Medical Institute in the
early 90s, he worked on developing high performance hardware and software for
computational structural biology. He moved to UCSD in 1995 to work on structural
bioinformatics. His current research interests are in structural genomics, the
structural basis of evolution and immunology, apoptosis, cell signaling, data
and knowledge modeling and scientific visualization.
Bourne is an elected Fellow of the American Medical Informatics Association and
past President of the International Society for Computational Biology. He is
the Founding Editor-in-Chief of the open access journal PLoS Computational
Biology, on the Advisory Board of Biopolymers and on the Editorial Boards of
Proteins: Structure Function and Bioinformatics, Biosilico and IEEE Trends
in Computational Biology and Bioinformatics. He is the author of over 200
scientific papers and 4 books. He has received two UCSD Connect Awards for
new inventions in the areas of comparative protein structure analysis and
shared visualization. He was the recipient of the 2002 Sun Microsystems
Convergence Award and the 2004 Convocation Medal for career achievement
from his graduate university. He has co-founded four companies.
What
is the current impact of the PDB archive on biology, and
what is the future of the archive?
Given that more than six
million data sets are downloaded from the wwPDB ftp archives
each month, clearly the impact is large. The archive is
recognized as a critical component in new drug discovery and
development processes, and in the advancement of structural
biology. While part of this usage is well understood - for
example, there are many instances where structure provided a
better understanding of biological function in disease states
that led to the treatment of those diseases through new
drugs - I suspect that there is a lot more to this story. A
challenge in the next 5 years for all of the wwPDB is to
better understand usage patterns and to help specific
communities use the PDB archive in a way that would be the
most beneficial to research and education.
Education is of keen
interest at the RCSB PDB. Students in grades K-12 will be the
leading scientists of tomorrow, and make up a key focus for
our outreach programs. Structure biology has an advantage, as
it is a visual science that can captivate young people. The
RCSB PDB reaches out to these students through resources such
as the New Jersey Science Olympiad and the Molecule of the
Month. Of course, we would very much like to do more. One way
we could proceed would be to take advantage of changing usage
patterns on the web. Students today are very communicative
online and use various social networking sites for hours on
end. They are also part of the "Wiki Generation", where
knowledge is defined by community input and consensus. Perhaps
we at the RCSB PDB could capture this collective knowledge
from teachers and students to create lessons around specific
molecules and classes of molecules?
What
about the older generation?
For established life
scientists, structure is often not a consideration and yet it
has a great deal to offer. It is my experience that many life
scientists associate molecular biology with DNA and protein
sequences, and then skip structural biology to consider
biochemical pathways, cellular processes, and whole cells and
organisms. Let me give you an example from recent research
work in my laboratory that makes this point using evolutionary
biologists as the test case. Since the time of Darwin,
evolution has been studied through simple observation by
paleontologists, zoologists, and botanists. Molecular biology,
through protein and DNA sequencing, has revolutionized these
evolutionary studies and allowed us to confirm and adjust the
tree of life. But sequence has its limitations. The sequence
signal degrades over long evolutionary time scales, and
distant relationships cannot be seen. Structure is far more
conserved than sequence over evolutionary time scales. With
our ability to map structures to the ever-increasing number of
fully completed proteomes, new insights can be made. Very few
evolutionary biologists think of using structure in this way.
One recent study from our laboratory showed how the tree of
life could be reconstructed just by considering whether given
species did or did not contain specific structural
superfamilies of proteins defined by SCOP. In my view, the
RCSB PDB has a role in facilitating these new kinds of studies
to bring them to the attention of a broader community. So in
this example, we could facilitate these studies by mapping
structural domains and their changing arrangements onto the
tree of life.
Given
these kinds of developments, where do you see the RCSB PDB
in 10 years?
The core mission of the RCSB
PDB - providing timely delivery of high quality and complete
structure data and useful and unique views of that data to
enable scientific innovation - will not change. Of course, there
will continue to be more and different types of data and the
RCSB PDB will need to maintain these high standards of quality
while catering to new types of delivery technology. It is hard
to believe that the internet has only been with us in a big
way for ten years or so. Given the fundamental change in how
we do science that has been brought about by the web, it is at
least conceivable that how we do science will change even more
dramatically in the future, even though we are hard-pressed to
detail what those changes might be at this time. I would guess
that we would need to provide data to people, software, and
applications in seamless ways at very different degrees of
granularity. Currently, most RCSB PDB queries return specific
structures, but in the future you can imagine many more
fine-grained requests from specific classes of scientist. For
example, the pharmacy students I teach might use their
handheld devices to ask a question like, "we see significant
instances of myocardial infarction in patients on select
estrogen receptor modulator drugs like tamoxifen, what is the
underlying biochemistry and molecular biology causing these
side effects?" The RCSB PDB's role in this request could
conceivably be to return and compare the receptors known to
bind this class of drugs and allow the student to better
understand the molecular implications. Inherent in this kind
of request is the RCSB PDB's ability to integrate with other
resources that permit the field of genomic medicine to advance
and to return data such that non-specialists can answer their
questions. These are significant (but fun) challenges.
Let's
bring you back more to the immediate future. The wwPDB
recently remediated the entire PDB archive. What effect has
this had on the RCSB PDB's query and reporting engine?
The remediation effort is
fundamental to the more far-reaching developments like those I
have just discussed. Consistent representation of the data we
have and the data we will collect going forward is critical if
we are to use the archive effectively and integrate with other
sources of data.
A very pragmatic example is
the work that has gone into the Chemical Component Dictionary.
As a result of this project, we can now reliably query ligands
in the PDB archive through their names and/or chemical
structures.
You
are heavily involved with the computational and systems
biology community - how do these scientists use the RCSB
PDB?
I would say my area of work
is best characterized as "structural bioinformatics," which is
a small part of computational and systems biology. Even with
this group of scientists, where structure is central, the
computational and systems biology communities are not yet
taking full advantage of what we have. Most work is still
performed using PDB files rather than XML files, and hence a
lot of useful information is not being utilized by this
community. This will change slowly as a generation of
scientists more adept at dealing with XML start to have a
stronger voice in the community. In terms of the bigger
picture, systems biology is in some sense the molecular
simulations of the new age. Rather than simulate the actions
of a few molecules we are simulating the actions of complete
pathways, cells and more. For now, at least, this work has
largely bypassed structure, but I suspect that will change.
The devil is in the details, and in the world of systems
biology, structure may well provide those details. For
example, much effort is currently going into mapping and
describing the topologies of protein-protein interaction
networks across a wide range of species and cell types.
Eventually, it will be necessary to come back and explore
specific interactions and here structure will be important.
The challenge then for the RCSB PDB is to make available in a
simple way the details of those interactions.
You
are very much involved with the Public Library of Science
(PLoS), a nonprofit organization committed to making the
world's scientific and medical literature freely available
to the public. Why is that important to you?
My work with PLoS is similar
to our work on the PDB, where we all work hard to make data
freely available to the worldwide community. PLoS tries to do
the same thing, but with the scientific literature. PLoS is a
standard bearer for the open access movement and I am very
passionate about it. Open access to the literature is a
controversial issue, and I appreciate the many sides to the
argument (which I will not get into here). It is important to
me that anyone can read the results of my research, but I
acknowledge that open access is a business model that is yet
to be proven. Nevertheless, I believe there is one component
of open access that is very important. Open access is not just
about access, but about copyright and format. Allowing anyone
to use material from an article, provided they provide the
appropriate attribution, opens up many possibilities when that
information is marked up in XML and accessible online. My
research group is experimenting with this through two
NSF-funded pilot studies. The first is to integrate journal
content with database content. Data and the knowledge derived
from that data have traditionally been reported and kept
separate (databases vs. publications). There is no reason for
this, and so we are trying to come up with ways to provide
more seamless and useful access between data and literature.
This would seem to be particularly relevant to structure
biology. A PDF file is a pretty poor way to express the
aspects of a structure-function relationship that need to be
looked at graphically. Take a simple example: a reader could
go to a paper, and upon seeing a figure, click that figure. An
identical version of that figure could be launched in a form
that could be rotated, annotated, and used to ask for more
information. My lab is developing prototypes that fulfill this
idea. A second effort integrates open access content with
video. We have developed a site called scivee.tv that attempts
to cater for the upcoming YouTube generation of scientists. In
"pubcasts," authors talk about their work in a video which is
then integrated with the open access content of their paper.
Relevant parts of the paper can highlighted as they speak. It
remains to be seen whether scientists like this approach and
whether it improves our ability to comprehend complex
material. If the answer is "yes", it may be useful to include
these kinds of developments into the RCSB PDB.
|