File(s) under embargo
until file(s) become available
GOING FOR IT ALL: IDENTIFICATION OF ENVIRONMENTAL RISK FACTORS AND PREDICTION OF GESTATIONAL DIABETES MELLITUS USING MULTI-LEVEL LOGISTIC REGRESSION IN THE PRESENCE OF CLASS IMBALANCE
Gestational Diabetes Mellitus (GDM) is defined as glucose intolerance with first onset during pregnancy in women without previous history of diabetes. The global prevalence of GDM oscillates between 2% and 17%, varying across countries and ethnicities. In the United States (U.S.), every year up to 13% of pregnancies are affected by this disease. Several risk factors for GDM are well established, such as race, age and BMI, while additional factors have been proposed that could affect the risk of developing the disease; some of them are modifiable, such as diet, while others are not, such as environmental factors.
Taking effective preventive actions against GDM require the early identification of women at highest risk. A crucial task to this end is the establishment of factors that increase the probabilities of developing the disease. These factors are both individual characteristics and choices and likely include environmental conditions.
The first part of the dissertation focuses on examining the relationship between food insecurity and GDM by using the National Health and Nutrition Examination Survey (NHANES), which has a representative sample of the U.S. population. The aim of this analysis is to determine a national estimate of the impact of food environment on the likelihood of developing GDM stratified by race and ethnicity. A survey weighted logistic regression model is used to assess these relationships which are described using odds ratios.
The goal of the second part of this research is to determine whether a woman’s risk of developing GDM is affected by her environment, also referred to in this work as level 2 variables. For that purpose, Medicaid claims information from Indiana was analyzed using a multilevel logistic regression model with sample balancing to improve the class imbalance ratio.
Finally, for the third part of this dissertation, a simulation study was performed to examine the impact of balancing on the prediction quality and inference of model parameters when using multilevel logistic regression models. Data structure and generating model for the data were informed by the findings from the second project using the Medicaid data. This is particularly relevant for medical data that contains measurements at the individual level combined with other data sources measured at the regional level and both prediction and model interpretation are of interest.
- Doctor of Philosophy
- Industrial Engineering
- West Lafayette