Purdue University Graduate School
Browse

TOPICS IN MULTIMODAL DATA INTEGRATION FOR BIOMEDICAL DISCOVERY

thesis
posted on 2025-11-17, 15:41 authored by Xiaochen YangXiaochen Yang
<p dir="ltr">Large-scale biomedical datasets have created unprecedented opportunities for understanding human health and disease, yet effectively integrating diverse data modalities to extract actionable insights remains challenging. This thesis addresses these challenges through three studies on multimodal data integration. The first study integrated genetic and imaging data by developing imaging-derived polygenic scores for 4,375 brain and body phenotypes from the UK Biobank. These scores successfully stratified disease risk for Alzheimer’s disease and multiple sclerosis across multiple cohorts, extending the utility of limited imaging data to broader populations. The second study integrated genomic data with clinical trial information, revealing that genes associated with two diseases were 7.46-fold more likely to be shared drug targets. This enrichment remained consistent across therapeutic areas, providing evidence-based guidance for drug repurposing strategies. The third study established theoretical foundations for integrating genetic and omic data in imputation-based mediation analysis. Using novel random matrix theory, this work demonstrated that ignoring prediction error leads to biased estimates and inflated false positives, particularly when genetic variants have direct phenotypic effects. Together, these studies advanced multimodal data integration approaches, maximizing the translational value of large-scale biomedical datasets.</p>

History

Degree Type

  • Doctor of Philosophy

Department

  • Statistics

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Bingxin Zhao

Advisor/Supervisor/Committee co-chair

Fei Xue

Additional Committee Member 2

Faming Liang

Additional Committee Member 3

Vinayak Rao