Table of Contents
Fetching ...

Generalizing Hamiltonian Monte Carlo with Neural Networks

Daniel Levy, Matthew D. Hoffman, Jascha Sohl-Dickstein

TL;DR

The paper tackles the challenge of slow mixing in high-dimensional MCMC by learning a generalization of Hamiltonian Monte Carlo, called L2HMC, where a neural network parameterizes a leapfrog-like update on an augmented state. Trained with an ESJD-based objective and burn-in regularization, L2HMC can implement non-volume-preserving transformations with a tractable Jacobian, achieving significantly faster mixing than standard HMC on ill-conditioned, highly correlated, and multimodal landscapes, and extending to latent-variable generative modeling. The approach yields substantial empirical gains, including up to 106x improvements in effective sample size and better posterior expressivity, demonstrated on toy energy functions and MNIST-based DLGM experiments, with code released for broad use. Overall, L2HMC provides a versatile, black-box framework for fast, accurate MCMC in complex distributions, bridging traditional sampling techniques and modern neural methods.

Abstract

We present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution. Our method generalizes Hamiltonian Monte Carlo and is trained to maximize expected squared jumped distance, a proxy for mixing speed. We demonstrate large empirical gains on a collection of simple but challenging distributions, for instance achieving a 106x improvement in effective sample size in one case, and mixing when standard HMC makes no measurable progress in a second. Finally, we show quantitative and qualitative gains on a real-world task: latent-variable generative modeling. We release an open source TensorFlow implementation of the algorithm.

Generalizing Hamiltonian Monte Carlo with Neural Networks

TL;DR

The paper tackles the challenge of slow mixing in high-dimensional MCMC by learning a generalization of Hamiltonian Monte Carlo, called L2HMC, where a neural network parameterizes a leapfrog-like update on an augmented state. Trained with an ESJD-based objective and burn-in regularization, L2HMC can implement non-volume-preserving transformations with a tractable Jacobian, achieving significantly faster mixing than standard HMC on ill-conditioned, highly correlated, and multimodal landscapes, and extending to latent-variable generative modeling. The approach yields substantial empirical gains, including up to 106x improvements in effective sample size and better posterior expressivity, demonstrated on toy energy functions and MNIST-based DLGM experiments, with code released for broad use. Overall, L2HMC provides a versatile, black-box framework for fast, accurate MCMC in complex distributions, bridging traditional sampling techniques and modern neural methods.

Abstract

We present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution. Our method generalizes Hamiltonian Monte Carlo and is trained to maximize expected squared jumped distance, a proxy for mixing speed. We demonstrate large empirical gains on a collection of simple but challenging distributions, for instance achieving a 106x improvement in effective sample size in one case, and mixing when standard HMC makes no measurable progress in a second. Finally, we show quantitative and qualitative gains on a real-world task: latent-variable generative modeling. We release an open source TensorFlow implementation of the algorithm.

Paper Structure

This paper contains 30 sections, 16 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: L2HMC mixes faster than well-tuned HMC, and than A-NICE-MC, on a collection of toy distributions.
  • Figure 2: Training and held-out log-likelihood for models trained with L2HMC, HMC, and the ELBO (VAE).
  • Figure 3: Demonstrations of the value of a more expressive posterior approximation.
  • Figure 4: Diagram of our L2HMC-DGLM model. Nodes are functions of their parents. Round nodes are deterministic, diamond nodes are stochastic and the doubly-circled node is observed.
  • Figure 5: L2HMC-DGLM decoder produces sharper mean activations.