File(s) under embargo
Reason: The methods are still under review and waiting for publication.
until file(s) become available
AI-powered systems biology models to study human disease
The fast advancing of high-throughput technology has reinforced the biomedical research ecosystem with highly scaled and commercialized data acquisition standards, which provide us with unprecedented opportunity to interrogate biology in novel and creative ways. However, unraveling the high dimensional data in practice is difficult due to the following challenges: 1) how to handle outlier and data contaminations; 2) how to address the curse of dimensionality; 3) how to utilize occasionally provided auxiliary information such as an external phenotype observation or spatial coordinate; 4) how to derive the unknown non-linear relationship between observed data and underlying mechanisms in complex biological system such as human metabolic network.
In sight of the above challenges, this thesis majorly focused on two research directions, for which we have proposed a series of statistical learning and AI-empowered systems biology models. This thesis separates into two parts. The first part focuses on identifying latent low dimensional subspace in high dimensional biomedical data. Firstly, we proposed CAT method which is a robust mixture regression method to detect outliers and estimate parameter simultaneously. Then, we proposed CSMR method in studying the heterogeneous relationship between high dimensional genetic features and a phenotype with penalized mixture regression. At last, we proposed SRMR which investigate mixture linear relationship over spatial domain. The second part focuses on studying the non-linear relationship for human metabolic flux estimation in complex biological system. We proposed the first method in this domain that can robustly estimate flux distribution of a metabolic network at the resolution of individual cells.