This year, the National Science Foundation (NSF) is celebrating its 75th anniversary. NSF support was essential in the original development of BASIL (Biochemistry Authentic Scientific Inquiry Lab). Ongoing NSF support over the past ten years has enabled the BASIL community to grow in numbers and in collaboration with other teacher/scholar teams who are seeking to change undergraduate biochemistry education. At the same time, NSF support has also provided support for our most critical online resource, the RCSB Protein Data Bank, which has always provided us with the structures that we study and, increasingly, is providing us with the tools that our students use to explore these structures and predict their function.
BASIL began as a research project led by Paul Craig (Rochester Institute of Technology) and Herbert J. Bernstein (Dowling College) when our students developed the ProMOL plugin for PyMOL[1] and began using it to explore proteins of unknown function in the PDB, which had primarily been generated by the Structural Genomics Initiative [2,3]. Brett Hanson and Charlie Westin designed ProMOL to search for enzyme active sites in protein structures using about thirty annotated motifs from the Mechanism and Catalytic Site Atlas (M-CSA) [4]. As we developed ProMOL, we used hundreds of structures from the RCSB PDB to create a library of well-annotated enzyme active site templates. We then discovered that the Structural Genomics Initiative had deposited >4000 protein structures that lacked functional annotation. The undergraduate research students were able to suggest functions for these proteins by using their active site library, along with results from BLAST[5], Pfam[6], and DALI[7]. They then began testing these predicted functions in our research lab. Figure 1 illustrates an example of an active site alignment done in ProMOL. Similar alignments were found for more than 50 proteins of unknown function in the RCSB PDB[8].
Figure 1. Alignment for a serine protease with ProMOL and PyMOL. Alignment of PDB entry 1AFQ (bovine gamma chymotrypsin; the query in red) with a motif template based on 1A0J (in white) a trypsin structure from Atlantic salmon. Three residues from 1AFQ (His 57, Asp 102 and Ser 195) aligned with the three homologous residues from 1A0J (His 57, Asp 102 and Ser 195).
After six or seven years of funding from NIH, our program officer encouraged us to move our project to the Division of Undergraduate Education (DUE) at NSF. The DUE focuses on how to improve STEM education and engage more undergraduate students in STEM. Although we had missed the deadline, NSF allowed us to submit an off-cycle proposal to the IUSE (Improving Undergraduate STEM Education) program. We did not receive funding, but did receive excellent feedback that led to funding the following year. NSF support since then has enabled us to develop our initial project into the BASIL curriculum and to validate its impact on student learning[9–11]. We are currently offering workshops (listed on the BASIL website) to help people adopt the BASIL curriculum and to promote valid assessment of student learning using the BASIL CURE (Course-based Undergraduate Research Experience).
Figure 2. A flowchart describing the BASIL approach to protein function prediction.
Students work with proteins that have known structures but lack functional annotation. A search of the RCSB PDB with the term “unknown function” revealed about 4000 hits when we started this project in 2012, with many more proteins now without an assigned function due to the rapid increase in computationally-solved structures. Continuous NSF support of this public protein repository allows any student at any university to undertake this kind of research project. Initial versions of the BASIL curriculum used a plugin for PyMOL called ProMOL [1], that enabled users to compare a query protein against a library of several hundred enzyme active sites based on the Mechanism and Catalytic Site Atlas [4]. Initial searches revealed about a dozen of these proteins of unknown function as potential serine hydrolases, so we chose that as the starting point for BASIL. We found that plasmids containing the genes for most of these proteins were available at the DNASU Plasmid Repository. One member of the BASIL core team, Mike Pikaart (Hope College), worked with DNASU to create the BASIL starter pack, a collection of ten proteins that can be ordered for minimal cost. Details of the clones and their associated PDB files are included in Table 1.
Table 1. The BASIL Starter Kit contains plasmids which can be used to overexpress the indicated proteins. It is available for purchase at modest cost from DNASU.
Clone ID |
Species |
PDB ID |
---|---|---|
LiCD00311532 |
Listeria innocua |
3DS8 |
RrCD00335772 |
Cupriavidus pinatubonensis JMP134 |
3B7F |
BsCD00370437 |
Bacillus subtilis subsp. subtilis str. 168 |
3CBW |
SaCD00432683 |
Staphylococcus aureus subsp. aureus Mu50 |
3H04 |
EfCD00450400 |
Enterococcus faecalis V583 |
3L1W |
BsCD00531324 |
Bacillus subtilis |
2O14 |
UnCD00534390 |
Unknown |
3FEQ |
EfCD00584424 |
Enterococcus faecalis V583 |
2QRU |
UnCD00663579 |
Unknown |
4DIU |
CpCD00696668 |
Chitinophaga pinensis DSM 2588 |
4Q7Q |
The BASIL curriculum was originally designed to have students begin with the computational modules, formulate a hypothesis about the function of the protein and then move to the wet lab to express, purify and characterize the protein. The current computational modules include sequence analysis with BLAST and Interpro [12], active site alignment with SPRITE [13], a full backbone alignment with Dali [7], and docking with Autodock Vina on the SwissDock website [14]. Recently we have developed modules based on two AI tools: CLEAN [15], which predicts the Enzyme Commission Class for the protein based on its sequence, and Foldseek [16], which uses a unique algorithm to perform rapid full backbone alignments of submitted structures.
BASIL instructors are encouraged to adapt the curriculum to work on their campuses and a number of implementations other than the original approach described in the preceding paragraph have been successful [17]:
During the pandemic a number of colleagues contacted the BASIL team and implemented the computational modules as their biochemistry lab course as a fully online response to required emergency remote instruction; others had their students complete all of the modules during the pandemic. The computational modules transitioned readily to fully online implementation. Students completed the wet labs by working with existing data (SDS-PAGE gels, protein assays, enzyme activity assays) and we found they achieved most of the desired learning objectives [18]. Arthur Sikora (Nova Southeastern University) is developing a fully virtual BASIL curriculum that is currently being tested at Nova and Ursinus College.
The BASIL curriculum is fully open source. In addition, instructors can request access to resources that include teaching guidelines, answers to assessments, explanations of the data analysis, and alternative protocols if they lack access to an instrument. For example, we provide alternatives to sonication for cell lysis, and methods for making your own auto-induction medium in order to control costs.
The BASIL curriculum currently has 11 modules, six of which utilize the PDB extensively. Some modules require data from the PDB in order to characterize a protein of interest, while others rely on tools that are based on the extensive amount of data about protein structure that is available at the PDB. BASIL modules relying on the PDB include:
As BASIL has grown, we have focused on making our resources fully accessible to all institutions. One of our major steps has been to move from requiring users to install software to using fully online web applications so that students and faculty can access them easily. The team of software engineers and designers at the RCSB PDB have developed multiple resources that are helpful in curriculum development and implementation.
Ongoing support for RCSB PDB provided by the NSF, NIH, and DOE has been essential for the development and maintenance of the BASIL curriculum. Faculty and students benefit from access to this free digital data resource, with the BASIL curriculum being just one example of how NSF support has enhanced STEM education.
Examples of student work using the BASIL curriculum can be found online. One set of student work is housed on Proteopedia (Table 2), where students curate their protein function identification work. Two examples of Proteopedia pages of student work are provided. Students also present their work at a variety of conferences, with abstracts provided for 5 such presentations (Table 3).
Table 2. Student generated resources housed on Proteopedia.
PDB ID |
Topic |
Report |
---|---|---|
3HDT |
This structure is described as a putative kinase and we have shown it has limited activity as a cytidylate kinase |
|
3R8E |
A novel glucose kinase |
Table 3. Student presentations at national scientific conferences.
PDB ID |
Topic |
Meeting Abstract |
---|---|---|
|
Novel carboxylesterase protein function |
|
2O14 3H04 |
GDSL lipase/esterase family alpha/beta hydrolase family |
|
3R8E |
Novel glucose Kinase |
|
1ZBS 3DNU |
Predicted to be an N-acetyl-glucosamine (NAG) kinase. |
|
|
General presentation of multiple putative kinases |
Building and Maintaining the BASIL Community
The BASIL Curriculum has two distinct groups of users. One group comprises faculty exploring biochemistry laboratory curriculum ideas. The curriculum is available free of charge, including 11 student modules, instructor resource documents, and assessment questions for each module. There is a BASIL users Slack channel, where faculty support each other with curriculum adoption, troubleshooting, and updates. The other user group for the BASIL curriculum comprises students being asked to use one or more modules in a course. Students tend to access only the modules page, downloading modules when directed to by an instructor. Students are not given access to instructor resources and are not members of the Slack channel. A Level 1 IUSE grant from the NSF supported the initial development of the BASIL curriculum modules. Ongoing NSF support has allowed the BASIL curriculum to be shared with 140 campuses; portions or all of the curriculum have been adopted at >50 of them. It has also supported assessing and refining the effectiveness of the BASIL curriculum, with the goal of improving STEM education and enhancing scientific workforce development.
As BASIL has grown, the NSF requirement for a sustainability plan has helped shape how the community is maintained. A steering committee now focuses on high level issues and long term goals, with input from a core team of BASIL faculty. BASIL adopters with an interest in the management of BASIL are invited to join the core team, helping sustain BASIL in the long-term. Two committees support instructors, one focused on onboarding for new users and one focused on supporting all instructors (including workshops about the various computational tools, assessment of student work, and other emerging topics). An assessment committee focuses on both assessing student work and evaluating the effectiveness of the BASIL curriculum as a tool in biochemistry education. A modules development committee makes sure the curriculum is updated regularly, develops new modules, and identifies modules needing to be retired. A data management team helps with curating data and with maintaining the BASIL website, so the resources remain freely available.
The relationship between BASIL and the NSF is critical for the BASIL community. Were the PDB not available for any length of time, faculty using the BASIL curriculum would not be able to implement it. PDB structures are used in all of our computational modules and many of the wet lab modules. Members of the BASIL community in turn work with the PDB on providing feedback about various tools and developments that would be of use to the academic community. BASIL students and faculty have participated in summer PDB workshops at at Rutgers, twice in person and once online. The support of the NSF for both BASIL and the PDB has been essential, as neither would have flourished without that support. The support of the NSF IUSE program has driven the growth and development of the BASIL curriculum and community.
With our current NSF support for BASIL, we continue to build the community by providing open source materials. This includes virtual workshops throughout the year, as well as the more intensive BASIL week offerings each summer. Planning is underway for an online 10th anniversary BASIL celebration from June 9-13, 2025, bringing together faculty to improve STEM education and the BASIL curriculum (https://www.basilbiochem.org/basil-week-2025). Expansion of the BASIL curriculum is underway, including a potential new module focused on Molecular Dynamics simulations to help assign protein functions. Community members are also working on Python scripting for some of the computational modules and data analysis for the wet lab modules. There may soon be enough computational modules to develop a full semester computational version of BASIL. Individuals are welcome to contribute their skills and interests to BASIL, including developing methods for studying non-hydrolase enzyme classes, assigning functions for non-enzymatic proteins, and the development of novel modules that focus on fluorescence, surface plasmon resonance, and the synthesis of novel substrates.
Paul A. Craig is a professor of chemistry at the Rochester Institute of Technology since 1993. His research spans computational biochemistry, biochemistry education, and protein function prediction. He has been working with the BASIL team since 2015.
Bonnie L. Hall is a professor of Chemistry at Grand View University. She is interested in protein function, including predicting new functions, enzyme engineering, and machine learning approaches. She has been with the BASIL team since 2018.
Join us!
If you are interested in learning more about BASIL and the BASIL curriculum