Reason: Some of the chapters in this thesis are under review and submission for publication.
until file(s) become available
Variational Inference for Data-driven Stochastic Programming
thesisposted on 30.07.2021, 16:37 by Prateek JaiswalPrateek Jaiswal
Stochastic programs are standard models for decision-making under uncertainty and have been extensively studied in the operations research literature. In general, stochastic programming involves minimizing an expected cost function, where the expectation is with respect to fully specified stochastic models that quantify the aleatoric or `inherent' uncertainty in the decision-making problem. In practice, however, the stochastic models are unknown but can be estimated from data, introducing an additional epistemic uncertainty into the decision-making problem. The Bayesian framework provides a coherent way to quantify the epistemic uncertainty through the posterior distribution by combining prior beliefs of the decision-makers with the observed data. Bayesian methods have been used for data-driven decision-making in various applications such as inventory management, portfolio design, machine learning, optimal scheduling, and staffing, etc.
Bayesian methods are challenging to implement, mainly due to the fact that the posterior is computationally intractable, necessitating the computation of approximate posteriors. Broadly speaking, there are two methods in the literature implementing approximate posterior inference. First are sampling-based methods such as Markov Chain Monte Carlo. Sampling-based methods are theoretically well understood, but they suffer from various issues like high variance, poor scalability to high-dimensional problems, and have complex diagnostics. Consequently, we propose to use optimization-based methods collectively known as variational inference (VI) that use information projections to compute an approximation to the posterior. Empirical studies have shown that VI methods are computationally faster and easily scalable to higher-dimensional problems and large datasets. However, the theoretical guarantees of these methods are not well understood. Moreover, VI methods are empirically and theoretically less explored in the decision-theoretic setting.
In this thesis, we first propose a novel VI framework for risk-sensitive data-driven decision-making, which we call risk-sensitive variational Bayes (RSVB). In RSVB, we jointly compute a risk-sensitive approximation to the `true' posterior and the optimal decision by solving a minimax optimization problem. The RSVB framework includes the naive approach of first computing a VI approximation to the true posterior and then using it in place of the true posterior for decision-making. We show that the RSVB approximate posterior and the corresponding optimal value and decision rules are asymptotically consistent, and we also compute their rate of convergence. We illustrate our theoretical findings in both parametric as well as nonparametric setting with the help of three examples: the single and multi-product newsvendor model and Gaussian process classification. Second, we present the Bayesian joint chance-constrained stochastic program (BJCCP) for modeling decision-making problems with epistemically uncertain constraints. We discover that using VI methods for posterior approximation can ensure the convexity of the feasible set in (BJCCP) unlike any sampling-based methods and thus propose a VI approximation for (BJCCP). We also show that the optimal value computed using the VI approximation of (BJCCP) are statistically consistent. Moreover, we derive the rate of convergence of the optimal value and compute the rate at which a VI approximate solution of (BJCCP) is feasible under the true constraints. We demonstrate the utility of our approach on an optimal staffing problem for an M/M/c queue. Finally, this thesis also contributes to the growing literature in understanding statistical performance of VI methods. In particular, we establish the frequentist consistency of an approximate posterior computed using a well known VI method that computes an approximation to the posterior distribution by minimizing the Renyi divergence from the ‘true’ posterior.