# Adaptive Sampling Methods for Stochastic Optimization

This dissertation investigates the use of sampling methods for solving stochastic optimization problems using iterative algorithms. Two sampling paradigms are considered: (i) adaptive sampling, where, before each iterate update, the sample size for estimating the objective function and the gradient is adaptively chosen; and (ii) retrospective approximation (RA), where, iterate updates are performed using a chosen fixed sample size for as long as progress is deemed statistically significant, at which time the sample size is increased. We investigate adaptive sampling within the context of a trust-region framework for solving stochastic optimization problems in $\mathbb{R}^d$, and retrospective approximation within the broader context of solving stochastic optimization problems on a Hilbert space. In the first part of the dissertation, we propose Adaptive Sampling Trust-Region Optimization (ASTRO), a class of derivative-based stochastic trust-region (TR) algorithms developed to solve smooth stochastic unconstrained optimization problems in $\mathbb{R}^{d}$ where the objective function and its gradient are observable only through a noisy oracle or using a large dataset. Efficiency in ASTRO stems from two key aspects: (i) adaptive sampling to ensure that the objective function and its gradient are sampled only to the extent needed, so that small sample sizes are chosen when the iterates are far from a critical point and large sample sizes are chosen when iterates are near a critical point; and (ii) quasi-Newton Hessian updates using BFGS. We prove three main results for ASTRO and for general stochastic trust-region methods that estimate function and gradient values adaptively, using sample sizes that are stopping times with respect to the sigma algebra of the generated observations. The first asserts strong consistency when the adaptive sample sizes have a mild logarithmic lower bound, assuming that the oracle errors are light-tailed. The second and third results characterize the iteration and oracle complexities in terms of certain risk functions. Specifically, the second result asserts that the best achievable $\mathcal{O}(\epsilon^{-1})$ iteration complexity (of squared gradient norm) is attained when the total relative risk associated with the adaptive sample size sequence is finite; and the third result characterizes the corresponding oracle complexity in terms of the total generalized risk associated with the adaptive sample size sequence. We report encouraging numerical results in certain settings. In the second part of this dissertation, we consider the use of RA as an alternate adaptive sampling paradigm to solve smooth stochastic constrained optimization problems in infinite-dimensional Hilbert spaces. RA generates a sequence of subsampled deterministic infinite-dimensional problems that are approximately solved within a dynamic error tolerance. The bottleneck in RA becomes solving this sequence of problems efficiently. To this end, we propose a progressive subspace expansion (PSE) framework to solve smooth deterministic optimization problems in infinite-dimensional Hilbert spaces with a TR Sequential Quadratic Programming (SQP) solver. The infinite-dimensional optimization problem is discretized, and a sequence of finite-dimensional problems are solved where the problem dimension is progressively increased. Additionally, (i) we solve this sequence of finite-dimensional problems only to the extent necessary, i.e., we spend just enough computational work needed to solve each problem within a dynamic error tolerance, and (ii) we use the solution of the current optimization problem as the initial guess for the subsequent problem. We prove two main results for PSE. The first assesses convergence to a first-order critical point of a subsequence of iterates generated by the PSE TR-SQP algorithm. The second characterizes the relationship between the error tolerance and the problem dimension, and provides an oracle complexity result for the total amount of computational work incurred by PSE. This amount of computational work is closely connected to three quantities: the convergence rate of the finite-dimensional spaces to the infinite-dimensional space, the rate of increase of the cost of making oracle calls in finite-dimensional spaces, and the convergence rate of the solution method used. We also show encouraging numerical results on an optimal control problem supporting our theoretical findings.

## History

## Degree Type

- Doctor of Philosophy

## Department

- Statistics

## Campus location

- West Lafayette