Generalizing Hamiltonian Monte Carlo with Neural Networks
Daniel Levy, Matthew D. Hoffman, Jascha Sohl-Dickstein
TL;DR
The paper tackles the challenge of slow mixing in high-dimensional MCMC by learning a generalization of Hamiltonian Monte Carlo, called L2HMC, where a neural network parameterizes a leapfrog-like update on an augmented state. Trained with an ESJD-based objective and burn-in regularization, L2HMC can implement non-volume-preserving transformations with a tractable Jacobian, achieving significantly faster mixing than standard HMC on ill-conditioned, highly correlated, and multimodal landscapes, and extending to latent-variable generative modeling. The approach yields substantial empirical gains, including up to 106x improvements in effective sample size and better posterior expressivity, demonstrated on toy energy functions and MNIST-based DLGM experiments, with code released for broad use. Overall, L2HMC provides a versatile, black-box framework for fast, accurate MCMC in complex distributions, bridging traditional sampling techniques and modern neural methods.
Abstract
We present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution. Our method generalizes Hamiltonian Monte Carlo and is trained to maximize expected squared jumped distance, a proxy for mixing speed. We demonstrate large empirical gains on a collection of simple but challenging distributions, for instance achieving a 106x improvement in effective sample size in one case, and mixing when standard HMC makes no measurable progress in a second. Finally, we show quantitative and qualitative gains on a real-world task: latent-variable generative modeling. We release an open source TensorFlow implementation of the algorithm.
