<p dir="ltr">Large-scale biomedical datasets have created unprecedented opportunities for understanding human health and disease, yet effectively integrating diverse data modalities to extract actionable insights remains challenging. This thesis addresses these challenges through three studies on multimodal data integration. The first study integrated genetic and imaging data by developing imaging-derived polygenic scores for 4,375 brain and body phenotypes from the UK Biobank. These scores successfully stratified disease risk for Alzheimer’s disease and multiple sclerosis across multiple cohorts, extending the utility of limited imaging data to broader populations. The second study integrated genomic data with clinical trial information, revealing that genes associated with two diseases were 7.46-fold more likely to be shared drug targets. This enrichment remained consistent across therapeutic areas, providing evidence-based guidance for drug repurposing strategies. The third study established theoretical foundations for integrating genetic and omic data in imputation-based mediation analysis. Using novel random matrix theory, this work demonstrated that ignoring prediction error leads to biased estimates and inflated false positives, particularly when genetic variants have direct phenotypic effects. Together, these studies advanced multimodal data integration approaches, maximizing the translational value of large-scale biomedical datasets.</p>