Markov jump processes are continuous-time stochastic processes widely used in a variety of applied disciplines. Inference typically proceeds via Markov chain Monte Carlo (MCMC), the state-of-the-art being a uniformization-based auxiliary variable Gibbs sampler. This was designed for situations where the process parameters are known, and Bayesian inference over unknown parameters is typically carried out by incorporating it into a larger Gibbs sampler. This strategy of sampling parameters given path, and path given parameters can result in poor Markov chain mixing.
In this thesis, we focus on the problem of path and parameter inference for Markov jump processes.
In the first part of the thesis, a simple and efficient MCMC algorithm is proposed to address the problem of path and parameter inference for Markov jump processes. Our scheme brings Metropolis-Hastings approaches for discrete-time hidden Markov models to the continuous-time setting, resulting in a complete and clean recipe for parameter and path inference in Markov jump processes. In our experiments, we demonstrate superior performance over Gibbs sampling, a more naive Metropolis-Hastings algorithm we propose, as well as another popular approach, particle Markov chain Monte Carlo. We also show our sampler inherits geometric mixing from an ‘ideal’ sampler that is computationally much more expensive.
In the second part of the thesis, a novel collapsed variational inference algorithm is proposed. Our variational inference algorithm leverages ideas from discrete-time Markov chains, and exploits a connection between Markov jump processes and discrete-time Markov chains through uniformization. Our algorithm proceeds by marginalizing out the parameters of the Markov jump process, and then approximating the distribution over the trajectory with a factored distribution over segments of a piecewise-constant function. Unlike MCMC schemes that marginalize out transition times of a piecewise-constant process, our scheme optimizes the discretization of time, resulting in significant computational savings. We apply our ideas to synthetic data as well as a dataset of check-in recordings, where we demonstrate superior performance over state-of-the-art MCMC methods.