Purdue University Graduate School
Browse
Thesis_Sai.pdf (4.29 MB)

DEEP LEARNING ENABLED 3D PROTEIN STRUCTURE MODELING

Download (4.29 MB)

Proteins are critical to all living systems, so better understanding of their structure is essential to interpret their underlying function. Predicting 3D structure of proteins therefore has been one of the long-standing problems in the field of protein bioinformatics. Cryo-electron microscopy (cryo-EM) is an emerging experimental technique for determining 3D structures of proteins. Although structures determined at near-atomic resolution are now routinely reported by cryo-EM, many maps are determined at an intermediate resolution, so extracting structure information from these maps is a challenge. In this research, I report a new computational method, Emap2sec, which identifies the secondary structures of proteins (α-helices, β-sheets, and other structures) in cryo-EM maps at intermediate resolutions of 5-10 Å. Emap2sec uses a 3D deep convolutional neural network to detect secondary structure at each grid point in an EM map. Emap2sec was able to clearly identify the secondary structures in many maps tested and showed substantially better performance than existing methods. However, there are still many cases where even a resolution of 3-4 Å is not high enough to model molecular structures with standard computational tools. If the resolution obtained is about the empirical border line (3-4 Å), a small improvement would make structure modeling significantly easier. My subsequent research titled EM-GAN uses a novel 3D Generative Adversarial Network (GAN)-based method to generate an enhanced EM map from an existing cryo-EM map thereby enabling improved protein structure modeling. EM-GAN is designed to work with EM maps in the resolution range of 3-6 Å. EM-GAN was extensively tested on a dataset of 151 experimental EM maps and showed significant improvements in modeling of protein structures using de-novo modeling tools, MAINMAST and phenix. Lastly, my research on refinement of protein inter-residue contact maps, ContactGAN, uses a GAN framework to facilitate improvement in protein modeling starting from amino acid sequence. ContactGAN was able to make substantial refinements in noisy contact maps generated by recent contact prediction methods when tested on CASP13 and CASP14 datasets. ContactGAN can be integrated into any structure prediction pipeline to achieve the end goal of improved protein structure prediction.

History

Degree Type

  • Doctor of Philosophy

Department

  • Computer Science

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Daisuke Kihara

Additional Committee Member 2

ALEX POTHEN

Additional Committee Member 3

MAJID KAZEMIAN

Additional Committee Member 4

XAVIER TRICOCHE

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC