ENHANCING THE STOCHASTIC GRADIENT MARKOV CHAIN MONTE CARLO BY POPULATION
The scalability of stochastic gradient MCMC algorithms has made them increasingly important for Bayesian analysis of big data. However, these algorithms are prone to get- ting trapped in local energy minima. One effective way to address this local trap issue is by involving a population of Markov chains to interact with each other, so that the global information is visible to the local chains, and therefore able to assist them getting out from the local traps. Two novel algorithms are proposed here: the evolutionary stochastic gradient Langevin dynamics (ESGLD) and the local Stochastic Particle-Optimization Sampling (LSPOS). ESGLD operates with a population of Markov chains, each at a different temper- ature, and these chains interact through a snooker operator. The snooker operator performs a short run of SGLD along the line connecting the current sample to a selected anchor point. As long as a mode is located by other chains in the population, the target chain can transition to it by running the snooker operator, particularly when the mode is sharp and well-separated from others. The snooker operator enables the Markov chain to effectively escape local traps while maintaining invariance with respect to the target distribution. This significantly accelerates convergence, especially when the energy landscape is rugged. The LSPOS is an improvement made on Stochastic Particle-Optimization Sampling (SPOS). In SPOS, each chain in the population runs with SGLD plus an interaction through the Stein variation operator. This operator guarantees the unbiasedness of the sample distribution to the target distribution. The usually used kernel function in the Stein variation operator in the SPOS is the radial based kernel function (RBF). In LSPOS, this kernel function is instead chosen to be a local truncation of the RBF. This change confines the interactions to only the neighborhood chains, and prevents a chain to learn from those distant chains that has fundamentally different local geometry, which doesn’t help and sometimes even impedes the chain to escape the local trap. We provide theoretical guarantees for the convergence of the ESGLD and LSPOS algorithms, and demonstrate their numerical effectiveness through simulated and real data examples.
Funding
NSF DMS-2015498
NSF DMS-2210819
NIH R01-GM152717
History
Degree Type
- Doctor of Philosophy
Department
- Statistics
Campus location
- West Lafayette