File(s) under embargo
Reason: The files are under embargo for the purpose of publishing journal articles.
until file(s) become available
A Bayesian Semiparametric Approach to Estimating a Bacterium's Wild-Type Distribution and Prevalence: Accounting for Contamination and Measurement Error
thesisposted on 19.11.2020, 16:38 authored by Will A EaganWill A Eagan
Antimicrobial resistance (AMR) is a major challenge to modern medicine and of grave concern to public health. To monitor AMR, researchers analyze "drug/bug" collections of clinical assay results to estimate AMR prevalence and the distribution of susceptible (wild-type) strains. This estimation is challenging because (a) the collection of assay results is a mixture of susceptible and resistant (non-wild-type) strains and (b) the most commonly used dilution assay produces interval-censored readings. To limit the effects of contamination from non-wild-type strains, methods have focused on using the counts in the K left-most bins, with K based on different heuristics. This limited use of the available data can result in the loss of precision and accuracy of model parameters. More recent methods have fit all the bin counts using a mixture model. These methods, however, struggle with identifiability and rely on penalization or informative priors to obtain reasonable estimates. In addition, none of the methods specifically account for the inherent assay variability, which has been shown to encompass a three-fold dilution range.
To account for this measurement error and utilize the full data set of bin counts, we propose a Bayesian semiparametric method to handle both single-year and multiyear studies. Similar to the previous mixture model methods, we model the wild-type distribution parametrically. Because less is known about the non-wild-type distribution,
the proposed method uses a Dirichlet Process mixture model for the non-wild-type distribution. By accounting for measurement error we are able to impose biological
constraints on the degree of overlap between the two underlying true distributions. In doing this, we maintain identifiability. The feasibility of this approach and its improved precision and accuracy are demonstrated through simulation studies and an application to a real data set.