DEEP LEARNING ENABLED 3D PROTEIN STRUCTURE MODELING
Proteins are critical to all living systems, so better understanding of their structure is essential to interpret their underlying function. Predicting 3D structure of proteins therefore has been one of the long-standing problems in the field of protein bioinformatics. Cryo-electron microscopy (cryo-EM) is an emerging experimental technique for determining 3D structures of proteins. Although structures determined at near-atomic resolution are now routinely reported by cryo-EM, many maps are determined at an intermediate resolution, so extracting structure information from these maps is a challenge. In this research, I report a new computational method, Emap2sec, which identifies the secondary structures of proteins (α-helices, β-sheets, and other structures) in cryo-EM maps at intermediate resolutions of 5-10 Å. Emap2sec uses a 3D deep convolutional neural network to detect secondary structure at each grid point in an EM map. Emap2sec was able to clearly identify the secondary structures in many maps tested and showed substantially better performance than existing methods. However, there are still many cases where even a resolution of 3-4 Å is not high enough to model molecular structures with standard computational tools. If the resolution obtained is about the empirical border line (3-4 Å), a small improvement would make structure modeling significantly easier. My subsequent research titled EM-GAN uses a novel 3D Generative Adversarial Network (GAN)-based method to generate an enhanced EM map from an existing cryo-EM map thereby enabling improved protein structure modeling. EM-GAN is designed to work with EM maps in the resolution range of 3-6 Å. EM-GAN was extensively tested on a dataset of 151 experimental EM maps and showed significant improvements in modeling of protein structures using de-novo modeling tools, MAINMAST and phenix. Lastly, my research on refinement of protein inter-residue contact maps, ContactGAN, uses a GAN framework to facilitate improvement in protein modeling starting from amino acid sequence. ContactGAN was able to make substantial refinements in noisy contact maps generated by recent contact prediction methods when tested on CASP13 and CASP14 datasets. ContactGAN can be integrated into any structure prediction pipeline to achieve the end goal of improved protein structure prediction.
History
Degree Type
- Doctor of Philosophy
Department
- Computer Science
Campus location
- West Lafayette