Purdue University Graduate School
Thesis_Manu.pdf (84.41 MB)

Harnessing Transfer Learning and Image Analysis Techniques for Enhanced Biological Insights: Multifaceted Approaches to Diagnosis and Prognosis of Diseases

Download (84.41 MB)
Version 2 2024-04-28, 00:20
Version 1 2024-04-22, 15:07
posted on 2024-04-28, 00:20 authored by Ziyu LiuZiyu Liu

Despite the remarkable advancements of machine learning (ML) technologies in biomedical research, especially in tackling complex human diseases such as cancer and Alzheimer's disease, a considerable gap persists between promising theoretical results and dependable clinical applications in diagnosis, prognosis, and therapeutic decision-making. One of the primary challenges stems from the absence of large high-quality patient datasets, which arises from the cost and human labor required for collecting such datasets and the scarcity of patient samples. Moreover, the inherent complexity of the data often leads to a feature space dimension that is large compared with the sample size, potentially causing instability during training and unreliability in inference. To address these challenges, the transfer learning (TL) approach has been embraced in biomedical ML applications to facilitate knowledge transfer across diverse and related biological contexts. Leveraging this principle, we introduce an unsupervised multi-view TL algorithm, named MVTOT [1], which enables the analysis of various biomarkers across different cancer types. Specifically, we compress high-dimensional biomarkers from different cancer types into a low-dimensional feature space via nonnegative matrix factorization and distill common information shared by various cancer types using the Wasserstein distance defined by Optimal Transport theory. We evaluate the stratification performance on three early-stage cancers from the Cancer Genome Atlas (TCGA) project. Our framework, compared with other benchmark methods, demonstrates superior accuracy in patient survival outcome stratification.

Additionally, while patient-level stratification has enhanced clinical decision-making, our understanding of diseases at the single-cell (SC) level remains limited, which is crucial for deciphering disease progression mechanisms, monitoring drug responses, and prioritizing drug targets. It is essential to associate each SC with patient-level clinical traits such as survival hazard, drug response, and disease subtypes. However, SC samples often lack direct labeling with these traits, and the significant statistical gap between patient and SC-level gene expressions impedes the transfer of well-annotated patient-level disease attributes to SCs. Domain adaptation (DA), a TL subfield, addresses this challenge by training a domain-invariant feature extractor for both patient and SC gene expression matrices, facilitating the successful application of ML models trained on patient-level data to SC samples. Expanding upon an established deep-learning-based DA model, DEGAS [2], we substitute their computationally ineffective maximum mean discrepancy loss with the Wasserstein distance as the metric for domain discrepancy. This substitution facilitates the embedding of both SC and patient inputs into a common latent feature space. Subsequently, employing the model trained on patient-level disease attributes, we predict SC-level survival hazard, disease status, and drug response for prostate cancer, Alzheimer's SC data, and multiple myeloma data, respectively. Our approach outperforms benchmark studies, uncovering clinically significant cell subgroups and revealing the correlation between survival hazard and drug response at the SC level.

Furthermore, in addition to these approaches, we acknowledge the effectiveness of TL and image analysis in stratifying patients with early and late-stage Mild Cognitive Impairment based on neuroimaging, as well as predicting survival and metastasis in melanoma based on histological images. These applications underscore the potential of employing ML methods, especially TL algorithms, in addressing biomedical issues from various angles, thereby enhancing our understanding of disease mechanisms and developing new biomarkers predicting patient outcomes.


Degree Type

  • Doctor of Philosophy


  • Statistics

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Bruce A. Craig

Advisor/Supervisor/Committee co-chair

Min Zhang

Additional Committee Member 2

Xiao Wang

Additional Committee Member 3

Xiaoqian (Joy) Wang

Additional Committee Member 4

Kun Huang

Usage metrics




    Ref. manager