Dr. Philip E. Bourne, RCSB Protein Data Bank

Philip E. Bourne is a Professor in the Department of Pharmacology at the University of California, San Diego, co-director of the RCSB PDB, and an Adjunct Professor at the Burnham Institute and the Keck Graduate Institute.

He received his Ph.D. in chemistry from the Flinders University of South Australia in 1980 where he studied the structural and electrophilic effects of substitution on fully saturated caged hydrocarbon molecules. While a post-doctoral fellow at Sheffield University UK he contributed to the understanding of the structural role of ferritin in iron storage. Later as a Senior Research Scientist at Columbia University, he proposed mechanisms for the role of caracurines and snake toxins that operate postsynaptically. During the 80's, first as the Director of the Cancer Center Computer Facility, and later Director of the Medical School Computer Facility at Columbia, he helped establish a tumor registry and various applications and databases in support of patient care. As a Senior Associate of the Howard Hughes Medical Institute in the early 90s, he worked on developing high performance hardware and software for computational structural biology. He moved to UCSD in 1995 to work on structural bioinformatics. His current research interests are in structural genomics, the structural basis of evolution and immunology, apoptosis, cell signaling, data and knowledge modeling and scientific visualization.

Bourne is an elected Fellow of the American Medical Informatics Association and past President of the International Society for Computational Biology. He is the Founding Editor-in-Chief of the open access journal PLoS Computational Biology, on the Advisory Board of Biopolymers and on the Editorial Boards of Proteins: Structure Function and Bioinformatics, Biosilico and IEEE Trends in Computational Biology and Bioinformatics. He is the author of over 200 scientific papers and 4 books. He has received two UCSD Connect Awards for new inventions in the areas of comparative protein structure analysis and shared visualization. He was the recipient of the 2002 Sun Microsystems Convergence Award and the 2004 Convocation Medal for career achievement from his graduate university. He has co-founded four companies.

What is the current impact of the PDB archive on biology, and what is the future of the archive?

Given that more than six million data sets are downloaded from the wwPDB ftp archives each month, clearly the impact is large. The archive is recognized as a critical component in new drug discovery and development processes, and in the advancement of structural biology. While part of this usage is well understood - for example, there are many instances where structure provided a better understanding of biological function in disease states that led to the treatment of those diseases through new drugs - I suspect that there is a lot more to this story. A challenge in the next 5 years for all of the wwPDB is to better understand usage patterns and to help specific communities use the PDB archive in a way that would be the most beneficial to research and education.

Education is of keen interest at the RCSB PDB. Students in grades K-12 will be the leading scientists of tomorrow, and make up a key focus for our outreach programs. Structure biology has an advantage, as it is a visual science that can captivate young people. The RCSB PDB reaches out to these students through resources such as the New Jersey Science Olympiad and the Molecule of the Month. Of course, we would very much like to do more. One way we could proceed would be to take advantage of changing usage patterns on the web. Students today are very communicative online and use various social networking sites for hours on end. They are also part of the "Wiki Generation", where knowledge is defined by community input and consensus. Perhaps we at the RCSB PDB could capture this collective knowledge from teachers and students to create lessons around specific molecules and classes of molecules?

What about the older generation?

For established life scientists, structure is often not a consideration and yet it has a great deal to offer. It is my experience that many life scientists associate molecular biology with DNA and protein sequences, and then skip structural biology to consider biochemical pathways, cellular processes, and whole cells and organisms. Let me give you an example from recent research work in my laboratory that makes this point using evolutionary biologists as the test case. Since the time of Darwin, evolution has been studied through simple observation by paleontologists, zoologists, and botanists. Molecular biology, through protein and DNA sequencing, has revolutionized these evolutionary studies and allowed us to confirm and adjust the tree of life. But sequence has its limitations. The sequence signal degrades over long evolutionary time scales, and distant relationships cannot be seen. Structure is far more conserved than sequence over evolutionary time scales. With our ability to map structures to the ever-increasing number of fully completed proteomes, new insights can be made. Very few evolutionary biologists think of using structure in this way. One recent study from our laboratory showed how the tree of life could be reconstructed just by considering whether given species did or did not contain specific structural superfamilies of proteins defined by SCOP. In my view, the RCSB PDB has a role in facilitating these new kinds of studies to bring them to the attention of a broader community. So in this example, we could facilitate these studies by mapping structural domains and their changing arrangements onto the tree of life.

Given these kinds of developments, where do you see the RCSB PDB in 10 years?

The core mission of the RCSB PDB - providing timely delivery of high quality and complete structure data and useful and unique views of that data to enable scientific innovation - will not change. Of course, there will continue to be more and different types of data and the RCSB PDB will need to maintain these high standards of quality while catering to new types of delivery technology. It is hard to believe that the internet has only been with us in a big way for ten years or so. Given the fundamental change in how we do science that has been brought about by the web, it is at least conceivable that how we do science will change even more dramatically in the future, even though we are hard-pressed to detail what those changes might be at this time. I would guess that we would need to provide data to people, software, and applications in seamless ways at very different degrees of granularity. Currently, most RCSB PDB queries return specific structures, but in the future you can imagine many more fine-grained requests from specific classes of scientist. For example, the pharmacy students I teach might use their handheld devices to ask a question like, "we see significant instances of myocardial infarction in patients on select estrogen receptor modulator drugs like tamoxifen, what is the underlying biochemistry and molecular biology causing these side effects?" The RCSB PDB's role in this request could conceivably be to return and compare the receptors known to bind this class of drugs and allow the student to better understand the molecular implications. Inherent in this kind of request is the RCSB PDB's ability to integrate with other resources that permit the field of genomic medicine to advance and to return data such that non-specialists can answer their questions. These are significant (but fun) challenges.

Let's bring you back more to the immediate future. The wwPDB recently remediated the entire PDB archive. What effect has this had on the RCSB PDB's query and reporting engine?

The remediation effort is fundamental to the more far-reaching developments like those I have just discussed. Consistent representation of the data we have and the data we will collect going forward is critical if we are to use the archive effectively and integrate with other sources of data.

A very pragmatic example is the work that has gone into the Chemical Component Dictionary. As a result of this project, we can now reliably query ligands in the PDB archive through their names and/or chemical structures.

You are heavily involved with the computational and systems biology community - how do these scientists use the RCSB PDB?

I would say my area of work is best characterized as "structural bioinformatics," which is a small part of computational and systems biology. Even with this group of scientists, where structure is central, the computational and systems biology communities are not yet taking full advantage of what we have. Most work is still performed using PDB files rather than XML files, and hence a lot of useful information is not being utilized by this community. This will change slowly as a generation of scientists more adept at dealing with XML start to have a stronger voice in the community. In terms of the bigger picture, systems biology is in some sense the molecular simulations of the new age. Rather than simulate the actions of a few molecules we are simulating the actions of complete pathways, cells and more. For now, at least, this work has largely bypassed structure, but I suspect that will change. The devil is in the details, and in the world of systems biology, structure may well provide those details. For example, much effort is currently going into mapping and describing the topologies of protein-protein interaction networks across a wide range of species and cell types. Eventually, it will be necessary to come back and explore specific interactions and here structure will be important. The challenge then for the RCSB PDB is to make available in a simple way the details of those interactions.

You are very much involved with the Public Library of Science (PLoS), a nonprofit organization committed to making the world's scientific and medical literature freely available to the public. Why is that important to you?

My work with PLoS is similar to our work on the PDB, where we all work hard to make data freely available to the worldwide community. PLoS tries to do the same thing, but with the scientific literature. PLoS is a standard bearer for the open access movement and I am very passionate about it. Open access to the literature is a controversial issue, and I appreciate the many sides to the argument (which I will not get into here). It is important to me that anyone can read the results of my research, but I acknowledge that open access is a business model that is yet to be proven. Nevertheless, I believe there is one component of open access that is very important. Open access is not just about access, but about copyright and format. Allowing anyone to use material from an article, provided they provide the appropriate attribution, opens up many possibilities when that information is marked up in XML and accessible online. My research group is experimenting with this through two NSF-funded pilot studies. The first is to integrate journal content with database content. Data and the knowledge derived from that data have traditionally been reported and kept separate (databases vs. publications). There is no reason for this, and so we are trying to come up with ways to provide more seamless and useful access between data and literature. This would seem to be particularly relevant to structure biology. A PDF file is a pretty poor way to express the aspects of a structure-function relationship that need to be looked at graphically. Take a simple example: a reader could go to a paper, and upon seeing a figure, click that figure. An identical version of that figure could be launched in a form that could be rotated, annotated, and used to ask for more information. My lab is developing prototypes that fulfill this idea. A second effort integrates open access content with video. We have developed a site called scivee.tv that attempts to cater for the upcoming YouTube generation of scientists. In "pubcasts," authors talk about their work in a video which is then integrated with the open access content of their paper. Relevant parts of the paper can highlighted as they speak. It remains to be seen whether scientists like this approach and whether it improves our ability to comprehend complex material. If the answer is "yes", it may be useful to include these kinds of developments into the RCSB PDB.