Reinforcement Learning for Adaptive MCMC
Congye Wang, Wilson Chen, Heishiro Kanagawa, Chris. J. Oates
TL;DR
This work reframes adaptive MCMC as a reinforcement-learning task by introducing Reinforcement Learning Metropolis--Hastings (RLMH), which learns a state-dependent MH proposal via a neural-network-parameterized map φ and policy-gradient optimization. The authors prove ergodicity under diminishing adaptation and gradient clipping, establishing p-invariance for φ-MH and showing the adaptive chain converges to the target. They implement a gradient-free variant using deterministic policy gradient (DDPG) and demonstrate strong empirical performance on the PosteriorDB benchmark, often outperforming traditional gradient-free adaptive MCMC algorithms. The study highlights a general, theoretically sound pathway for applying RL to adaptive MCMC and suggests avenues for extending the approach to gradient-based proposals and related Monte Carlo methods.
Abstract
An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis--Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings algorithm on $\approx 90 \%$ of tasks in the PosteriorDB benchmark.
