Education Corner

Jessica A. Nash is a Software Scientist and the Education Lead at The Molecular Sciences Software Institute (MolSSI). She completed her PhD at North Carolina State University in Materials Science and Engineering, where she studied DNA nanomaterials using molecular dynamics simulations. At MolSSI, her work has focused on the development and improvement of software in the computational molecular sciences. She currently works as the lead developer for the web interface for MolSSI's project SEAMM (Simulation Environment for Atomistic and Molecular Simulation). As Education Lead for the Institute, she develops educational materials for researchers which enhance their capabilities to write code and use computational molecular science software. You can see her programming projects on GitHub or follow her on Twitter.

The Molecular Sciences Software Institute

Paul Craig photo

Paul Craig is a chemistry professor at the Rochester Institute of Technology in Rochester, NY. His research group focuses on predicting protein function using computational and bench methods. This led to the formation of the BASIL (Biochemistry Authentic Scientific Inquiry Lab) Course-based Undergraduate Research Experience, an NSF-funded project that now includes faculty and students on >20 campuses. He recently spent a semester working with Jessica Nash at the Molecular Sciences Software Institute and is currently on sabbatical at the RCSB Protein Data Bank at Rutgers University, where he is focusing on enhancing educational resources and user experiences. You can follow him on LinkedIn and Twitter.

Biochemistry Authentic Scientific Inquiry Lab biochemistry consortium

On February 22, 2022, Paul Craig and Jessica Nash led a virtual workshop on Python Scripting for Biochemistry and Molecular Biology for 163 participants in a “Crash Course” hosted by the IQB (Institute for Quantitative Biomedicine) at Rutgers University.

Jupyter book for the full Python Scripting for Biochemistry & Molecular Biology Workshop

This workshop was presented in a live-coding format in which users code along with the instructors. In addition to the instructors, four colleagues (Julia Koeppe, SUNY-Oswego; Sam Ellis, MolSSI; Joey Lubin, Rutgers-IQB; and Charlie Weiss, Augustana College) generously provided support to help participants solve short term problems so they could jump back into the fray. Attendance included roughly 42% graduate students, 19.2% faculty, 12% post doc, and 12% undergraduate. Of the 163 people in the course, 125 completed the attendance survey.

We believe that all scientists need to learn computer scripting/coding skills to remain competitive. Most undergraduate programs do not include coding skills in coursework for biology or chemistry majors, yet we hear of a need for basic coding skills from graduates who enter industry and those who go on to graduate school. The goal of this workshop was to introduce scientists from all professional levels (undergraduate, graduate, post-doc, faculty member, industrial scientist) to the use of Python programming in Jupyter notebooks, thereby enabling them to start taking advantage of computational power and flexibility that far exceeds data analysis and display tools found in Microsoft Excel and Apple Numbers.

Lesson 1. Introduction to Python and Jupyter Notebooks. We learned some basic Python syntax, data types, data structures, ‘for loops’ and logic operations.

Lesson 2. In File Parsing, we imported our first python library, os. We then opened PDB entry 4EYR (Crystal structure of multidrug-resistant clinical isolate 769 HIV-1 protease in complex with ritonavir), and used Python code to find and extract the abbreviation for ritonavir found in the file.

Lesson 3. Processing Multiple Files. We added another Python library, glob, and used a nested ‘for loop’ to extract RESOLUTION information from a group of 10 PDB files (though we could have used it for 10,000 files). We completed that lesson by opening and writing our findings in an appropriately formatted text file.

Lesson 4. Visualizing Structures in Jupyter Notebooks. We added one more library, nglview, which contains tools for loading PDB files, implementing many different representations and saving publication quality graphics. Following the live-coding lesson, everyone was challenged to load a protein of interest from the PDB, color the protein by secondary structure, change the representation of water, and selectively change the way atoms in rings are displayed.

Visualizing Structures in Jupyter Notebooks with NGL

At the conclusion of the workshop, 78 participants completed the exit survey, indicating an overall satisfaction rating of 8.7/10. Here are a few quotes from participants.

An extremely generous effort to introduce Python and Jupyter notebook to write scripts for computational structural biology projects. A four-hour crash course that motivates me to learn more. Thank you to Jessica, Paul, and Stephen for their time.

This workshop was extremely well done. I learned much more than I ever expected and the Jupyter Notebooks created will be a great resource for future learning and experimentation. THANK YOU!

Great way to start with Python for scientists! Very good mixture of theory and exercises.

It was great to have some practical, hands-on experience using Python. I would have had a lot of trouble learning how to do all of this myself. This truly opens up new areas of research--both for me and my students.

The course was intended to whet the appetites of participants and to help them see how these skills could open new vistas in their research and teaching. Next spring, we are planning another IQB Crash Course on Python Scripting where we will focus on data analysis using the pandas library and plotting using the matplotlib and seaborn libraries. We are also planning a full 12-hour workshop later in 2022 (dates to be determined). In addition to this workshop, the MolSSI education resources include workshops on Data Analysis with Python, Scientific Data Visualization Using Python, Best Practices in Python Programming, Machine Learning, High Performance Computing, and Molecular Modeling .

To learn more and gain access to future workshops, you can join the MolSSI mailing list.

In addition, explore these links to review the course materials and expand your knowledge of Python Scripting.

Education Corner

Python Scripting for Biochemistry & Molecular Biology

Jessica A. Nash, PhD (MolSSI) and Paul A. Craig, PhD (Rochester Institute of Technology)