PDB COMMUNITY FOCUS: HARUKI NAKAMURA, PDBJ

Haruki Nakamura is the Director of the Protein Data Bank Japan (PDBj) in Osaka, Japan, and one of the founding members of the wwPDB. Born in Tokyo, Japan, he received his Doctor of Science degree in physics at the Faculty of Science, University of Tokyo. His doctoral research resulted in a thesis titled 'Dielectric studies of the biological polyelectrolytes by Fourier-synthesized- pseudorandom-noise-dielectric spectrometer.'

He began his post-graduate career as a Research Associate at the Department of Applied Physics Faculty of Engineering at the University of Tokyo. As a Visiting Researcher at the Astbury Department of Biophysics at Leeds University, Haruki studied molecular graphics in the laboratory of Professor A. C. T. North. He has also been a guest professor at a number of prestigious universities. In 1987, he became the Director of Second Department, Protein Engineering Research Institute (PERI) at Osaka, studying molecular modeling, design and analysis, and in 1996, he was named Research Director, Department of Bioinformatics, Biomolecular Engineering Research Institute (BERI) - a successive institute of PERI.

In addition to being the director of PDBj, Haruki is currently a Professor in the Laboratory of Protein Informatics, at the Research Center for Structural and Functional Proteomics, Institute for Protein Research, Osaka University.

Throughout his career his research interests have included biophysical studies of protein architecture, electrostatic properties and enzymatic functions, protein modeling, protein design, computational chemistry, and structural bioinformatics.

Q:How did you come to be involved with the PDBj, and how has your own research influenced your vision for the PDBj?

A: The Institute of Protein Research (IPR) at Osaka University has collaborated with the PDB since its foundation in the 1970s. However, up until five years ago, the collaboration had been very limited due to little governmental support. What I first did after moving to IPR was to emphasize the importance of the life science databases and bioinformatics technology, and to persuade the university and the government that IPR should contribute to the PDB database much more than before. Fortunately, my proposal to develop these areas -- to curate, edit, and distribute structural data, develop a new XML format with an XML-based browser, develop several secondary databases, and start a mirror site of BMRB -- was approved by our Japanese government, to accompany the structural genomics project in Japan. In order to promote all these activities, we founded a new organization called PDBj. The PDBj activity is not pure research, but provides many services to scientists, students, and general citizens all over the world; in particular, we have some responsibility for the Asian and Oceania regions. However, as a service provider, our knowledge areas now cover a much wider field: Crystallography, NMR, Informatics, Graphics, Web technology, and so on. In particular, our experience developing the canonical PDBML in collaboration with the RCSB PDB has increased our skills in XML and GRID computing. Development of our secondary databases provided a good opportunity to learn about integrating computational chemistry and information science.

Q:What are your long-term goals for PDBj, especially in light of the rapid changes taking place in structural biology?

A: With more protein and DNA structures being determined rapidly, every data curation and editing procedure should be automated as much as possible without sacrificing data quality. The raw experimental data should also be stored and distributed from the PDB. Thus, one of our long-term goals for the future is to establish a stable data management system as a sustainable system that will not require much manual effort or large financial support. This may be realized with the rapid development of data grid technology, in which the distributed data yielded by structural biologists may be gathered and integrated based on the grid architecture through the Internet. Otherwise, database services may not be sustainable in society. The introduction of XML for description and validation of PDB data is the very first step to this goal.

Q:What is the nature of your interactions with the RCSB PDB and with EBI-MSD and what effect, if any, does the formation of the wwPDB have on these interactions?

A: When we started PDBj, the collaborations with RCSB PDB and with EBI-MSD were essential, because PDBj is the newest entry in this field. Therefore, the PDBj members have frequently visited both RCSB PDB and EBI-MSD, and we have also invited people from RCSB PDB and EBI-MSD, for example, when we organized international workshops at IPR, Osaka University. The foundation of wwPDB should make these collaborations much tighter than before. I am looking forward to attending the first annual meeting of wwPDB this year.

Q:Do you think that the wwPDB will be effective in providing for the long-term stability of the PDB archives?

A: Sure. Structural genomics projects and most structural biology research rely upon financial support from the governments of individual countries. The data management should thus be made by several different countries collaborating with each other. Foundation of the wwPDB is one of the essential points to make PDB a sustainable international database.

Q:The amount of biological structure data is increasing almost exponentially. From your experience, what do you see as the overall future for databases that have to deal with this explosion of data?

A: As mentioned previously, a stable data management system in the future should not require much manual effort or very huge financial support to be a sustainable system. Introducing a new procedure is inevitable, in which every data curation and editing procedure can be automated without losing any data quality. Development of PDBML for description and validation of PDB data is the very first step to this goal. In the near future, the processes and workflows procedures should be described using the standard ML technology. For that goal, more collaboration with computer scientists is necessary.