Mayank Kakodkar RSS Feed
https://hammer.purdue.edu/authors/Mayank_Kakodkar/12463929
RSS feed for Figshare Profile Mayank Kakodkar<![CDATA[Efficient and Scalable Subgraph Statistics using Regenerative Markov Chain Monte Carlo]]>https://hammer.purdue.edu/articles/thesis/Efficient_and_Scalable_Subgraph_Statistics_using_Regenerative_Markov_Chain_Monte_Carlo/19661022
https://hammer.purdue.edu/articles/thesis/Efficient_and_Scalable_Subgraph_Statistics_using_Regenerative_Markov_Chain_Monte_Carlo/19661022
5. In the past, sampling-based algorithms that work well in moderately-sized graphs for k ≤ 8 have been proposed. In this thesis I push this boundary up to k ≤ 16 for graphs containing up to 120M edges, and to k ≤ 25 for smaller graphs containing between a million to 20M edges. I do so by re-imagining two older, but elegant and memory-efficient algorithms -- FANMOD and PSRW -- which have large estimation errors by modern standards. This is because FANMOD produces highly correlated k-CIS samples and the cost of sampling the PSRW Markov chain becomes prohibitively expensive for k-CIS’s larger than k > 8.
In this thesis, I introduce:
(a) RTS: a novel regenerative Markov chain Monte Carlo (MCMC) sampling procedure on the tree, generated on-the-fly by the FANMOD algorithm. RTS is able to run on multiple cores and multiple machines (embarrassingly parallel) and compute confidence intervals of estimates, all this while preserving the memory-efficient nature of FANMOD. RTS is thus able to estimate subgraph statistics for k ≤ 16 for larger graphs containing up to 120M edges, and for k ≤ 25 for smaller graphs containing between a million to 20M edges.
(b) R-PSRW: which scales the PSRW algorithm to larger CIS-sizes using a rejection sampling procedure to efficiently sample transitions from the PSRW Markov chain. R-PSRW matches RTS in terms of scaling to larger CIS sizes.
(c) Ripple: which achieves unprecedented scalability by stratifying the R-PSRW Markov chain state-space into ordered strata via a new technique that I call sequential stratified regeneration. I show that the Ripple estimator is consistent, highly parallelizable, and scales well. Ripple is able to count CISs of size up to k ≤ 12 in real world graphs containing up to 120M edges.
My empirical results show that the proposed methods offer a considerable improvement over the state-of-the-art. Moreover my methods are able to run at a scale that has been considered unreachable until now, not only by prior MCMC-based methods but also by other sampling approaches.
Optimization of Restricted Boltzmann Machines. In addition, I also propose a regenerative transformation of MCMC samplers of Restricted Boltzmann Machines RBMs. My approach, Markov Chain Las Vegas (MCLV) gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has a maximum number of Markov chain step-count K (referred as MCLV-K). I present a MCLV-K gradient estimator (LVS-K) for RBMs and explore the correspondence and differences between LVS-K and Contrastive Divergence (CD-K). LVS-K significantly outperforms CD-K in the task of training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.]]>Tue, 26 Apr 2022 23:35:20 GMT