File(s) under embargo







until file(s) become available

Clinical Analytics and Personalized Medicine

posted on 19.10.2022, 15:45 authored by Chih-Hao FangChih-Hao Fang

The increasing volume and availability of Electronic Health Records (EHRs) open up opportunities for computational models to improve patient care. Key factors in improving patient outcomes include identifying patient sub-groups with distinct patient characteristics and providing personalized treatment actions with expected improved outcomes. This thesis investigates how well-formulated matrix decomposition and causal inference techniques can be leveraged to tackle the problem of disease sub-typing and inferring treatment recommendations in healthcare. In particular, the research resulted in computational techniques based on archetypal analysis to identify and analyze disease sub-types and a causal reinforcement learning method for learning treatment recommendations. Our work on these techniques are divided into four part in this thesis:

In the first part of the thesis, we present a retrospective study of Sepsis patients in intensive care environments using patient data. Sepsis accounts for more than 50% of hospital deaths, and the associated cost ranks the highest among hospital admissions in the US. Sepsis may be misdiagnosed because the patient is not thoroughly assessed or the symptoms are misinterpreted, which can lead to serious health complications or even death. An improved understanding of disease states, progression, severity, and clinical markers can significantly improve patient outcomes and reduce costs. We have developed a computational framework based on archetypal analysis that identifies disease states in sepsis using clinical variables and samples in the MIMIC-III database. Each identified state is associated with different manifestations of organ dysfunction. Patients in different states are observed to be statistically significantly composed of distinct populations with disparate demographic and comorbidity profiles. We furthermore model disease progression using a Markov chain. Our progression model accurately characterizes the severity level of each pathological trajectory and identifies significant changes in clinical variables and treatment actions during sepsis state transitions. Collectively, our framework provides a holistic view of sepsis, and our findings provide the basis for the future development of clinical trials and therapeutic strategies for sepsis. These results have significant implications for a large number of hospitalizations.

In the second part, we focus on the problem of recommending optimal personalized treatment policies from observational data. Treatment policies are typically based on randomized controlled trials (RCTs); these policies are often sub-optimal, inconsistent, and have potential biases. Using observational data, we formulate suitable objective functions that encode causal reasoning in a reinforcement learning (RL) framework and present efficient algorithms for learning optimal treatment policies using interventional and counterfactual reasoning. We demonstrate the efficacy of our method on two observational datasets: (i) observational data to study the effectiveness of right heart catheterization (RHC) in the initial care of 5735 critically ill patients, and (ii) data from the Infant Health and Development Program (IHDP), aimed at estimating the effect of the intervention on the neonatal health for 985 low-birth-weight, premature infants. For the RHC dataset, our method's policy prescribes right heart catheterization (RHC) for 11.5% of the patients compared to the best current method that prescribes RHC for 38% of the patients. Even with this significantly reduced intervention, our policy yields a 1.5% improvement in the 180-day survival rate and a 2.2% improvement in the 30-day survival rate. For the IHDP dataset, we observe a 3.16% improvement in the rate of improvement of neonatal health using our method's policy.

In the third part, we consider the Supervised Archetypal Analysis (SAA) problem, which incorporates label information to compute archetypes. We formulate a new constrained optimization problem incorporating Laplacian regularization to guide archetypes towards groupings of similar data points, resulting in label-coherent archetypes and label-consistent soft assignments. We first use the MNIST dataset to show that SAA can can yield better cluster quality over baselines on any chosen number of archetypes. We then use the CelebFaces Attributes dataset to demonstrate the superiority of SAA in terms of cluster quality and interpretability over competing supervised and unsupervised methods. We also demonstrate the interpretability of SAA decompositions in the context of a movie rating application. We show that the archetypes from SAA can be directly interpreted as user ratings and encode class-specific movie preferences. Finally, we demonstrate how the SAA archetypes can be used for personalized movie recommendations. 

In the last part of this thesis, we apply our SAA technique to clinical settings. We study the problem of developing methods for ventilation recommendations for Sepsis patients. Mechanical ventilation is an essential and commonly prescribed intervention for Sepsis patients. However, studies have shown that mechanical ventilation is associated with higher mortality rates on average, it is generally believed that this is a consequence of broad use of ventilation, and that a more targeted use can significantly improve average treatment effect and, consequently, survival rates. We develop a computational framework using Supervised Archetypal Analysis to stratify our cohort to identify groups that benefit from ventilators. We use SAA to group patients based on pre-treatment variables as well as treatment outcomes by constructing a Laplacian regularizer from treatment response (label) information and incorporating it into the objective function of AA. Using our Sepsis cohort, we demonstrate that our method can effectively stratify our cohort into sub-cohorts that have positive and negative ATEs, corresponding to groups of patients that should and should not receive mechanical ventilation, respectively. 

We then train a classifier to identify patient sub-cohorts with positive and negative treatment effects. We show that our treatment recommender, on average, has a high positive ATE for patients that are recommended ventilator support and a slightly negative ATE for those not recommended ventilator support. We use SHAP (Shapley Additive exPlanations) techniques for generating clinical explanations for our classifier and demonstrate their use in the generation of patient-specific classification and explanation. Our framework provides a powerful new tool to assist in the clinical assessment of Sepsis patients for ventilator use.


Degree Type

Doctor of Philosophy


Computer Science

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Ananth Y. Grama

Advisor/Supervisor/Committee co-chair

Petros S. Drineas

Additional Committee Member 2

David F. Gleich

Additional Committee Member 3

Alex Pothen

Additional Committee Member 4

Wojciech Szpankowski