MCMC-driven learning
Alexandre Bouchard-Côté, Trevor Campbell, Geoff Pleiss, Nikola Surjanovic
TL;DR
This work introduces Markovian optimization-integration (MOI), a unifying framework for problems where both the target distribution and the sampling kernel depend on parameters that are learned from the Markov chain, formalized as solving $g(\\phi)=\\mathbb{E}_{\\pi_\\phi}[g(X,\\phi)]=0$ or minimizing $f(\\phi)=\\mathbb{E}_{\\pi_\\phi}[f(X,\\phi)]$. It shows that a wide range of MCMC/ML tasks—such as forward and reverse KL variational inference, adaptive MCMC, transport-assisted MCMC, surrogate-based inference, coreset MCMC, and Markov chain gradient descent—fit MOI and can be translated across methods within a common theoretical umbrella. The chapter surveys gradient estimation strategies (reparameterization and REINFORCE), automatic differentiation, mini-batching, and stabilization techniques, and develops a convergence theory under deterministic, independent-noise, and Markovian-noise assumptions, including confinement and variance-reduction considerations. It culminates with a case-study-focused discussion of distribution approximation via forward KL minimization, tempering, and approximate transport maps, illustrating scalable MOI for learning expressive proposals and accelerating MCMC under big-data regimes.
Abstract
This paper is intended to appear as a chapter for the Handbook of Markov Chain Monte Carlo. The goal of this chapter is to unify various problems at the intersection of Markov chain Monte Carlo (MCMC) and machine learning$\unicode{x2014}$which includes black-box variational inference, adaptive MCMC, normalizing flow construction and transport-assisted MCMC, surrogate-likelihood MCMC, coreset construction for MCMC with big data, Markov chain gradient descent, Markovian score climbing, and more$\unicode{x2014}$within one common framework. By doing so, the theory and methods developed for each may be translated and generalized.
