File(s) under embargo

Reason: Some of the content in this thesis is under review.





until file(s) become available

Machine Learning Approaches to Reveal Discrete Signals in Gene Expression

posted on 25.04.2022, 14:31 by Changlin WanChanglin Wan

Gene expression is an intricate process that determines different cell types and functions in metazoans, where most of its regulation is communicated through discrete signals, like whether the DNA helix is open, whether an enzyme binds with its target, etc. Understanding the regulation signals of the selective expression process is essential to the full comprehension of biological mechanism and complicated biological systems. In this research, we seek to reveal the discrete signals in gene expression by utilizing novel machine learning approaches. Specifically, we focus on two types of data chromatin conformation capture (3C) and single cell RNA sequencing (scRNA-seq). To identify potential regulators, we utilize a new hypergraph neural network to predict genome interactions, where we find the gene co-regulation may result from the shared enhancer element. To reveal the discrete expression state from scRNA-seq data, we propose a novel model called LTMG that considered the biological noise and showed better goodness of fitting compared with existing models. Next, we applied Boolean matrix factorization to find the co-regulation modules from the identified expression states, where we revealed the general property in cancer cells across different patients. Lastly, to find more reliable modules, we analyze the bias in the data and proposed BIND, the first algorithm to quantify the column- and row-wise bias in binary matrix.


Degree Type

Doctor of Philosophy


Electrical and Computer Engineering

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Chi Zhang

Advisor/Supervisor/Committee co-chair

Mireille Boutin

Additional Committee Member 2

Edward J Delp

Additional Committee Member 3

Zina Ben-Miled