Purdue University Graduate School
2021.4.29 Kerstiens_Thesis.pdf (2.99 MB)


Download (2.99 MB)
posted on 2021-05-05, 23:52 authored by Emily A KerstiensEmily A Kerstiens

Bacteriophages are viruses that infect and kill bacteria. They are the most abundant organism on the planet and the largest source of untapped genetic information. Every year, more bacteriophages are isolated from the environment, purified, and sequenced. Once sequenced, their genomes are annotated to determine the location and putative function of each gene expressed by the phage. Phages have been used in the past for genetic engineering and new research is being done into how they can be used for the treatment of disease, water safety, agriculture, and food safety.

Despite the influx of sequenced bacteriophages, a majority of the genes annotated are hypothetical proteins, also known as No Known Function (NKF) proteins. They are expressed by the phages, but research has not identified a possible function. Wet lab research into the functions of the hundreds of NKF phages genes would be costly and could take years. Bioinformatics methods could be used to determine putative functions and functional categories for these hypothetical proteins. A new bioinformatics method using algorithms such as Domain Assignments, Hidden Markov Models, Structure Prediction, Sub-Cellular Localization, and iterative algorithms is proposed here. This new method was tested on the bacteriophage genome PotatoSplit and dropped the number of NKF genes from 57 to 40. A total of 17 new functions were found. The functional class was identified for an additional six proteins, though no specific functions were named. Structure Prediction and Simulations were tested with a focus on two NKF proteins within lytic phages and both returned possible functional categories with high confidence.

Additionally, this research focuses on the possibility of phage therapy and FDA regulation. A database of phage proteins was built and tested using R Statistical Analysis to determine proteins significant to phage infecting M. tuberculosis and to the lytic cycle of phages. The statistical methods were also tested on both pharmaceutical products recalled by the FDA between 2012 and 2018 to determine ingredients/manufacturing steps that could affect product quality and on the FDA Adverse Event Reporting System (FAERS) data to determine if AERs could be used to judge the quality of a product. Many significant excipients/manufacturing steps were identified and used to score products on their quality. The AERs were evaluated on two case studies with mixed results.


Degree Type

  • Master of Science


  • Agricultural and Biological Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Kari Clase

Additional Committee Member 2

Somali Chaterji

Additional Committee Member 3

Stephen Byrn