Table of Contents
Fetching ...

MCMC using $\textit{bouncy}$ Hamiltonian dynamics: A unifying framework for Hamiltonian Monte Carlo and piecewise deterministic Markov process samplers

Andrew Chin, Akihiko Nishimura

TL;DR

This work establishes that, in fact, the connection between the two paradigms extends far beyond the specific instance of PDMPs, and turns this observation into a rigorous framework for constructing rejection-free Metropolis proposals based on bouncy Hamiltonian dynamics.

Abstract

Piecewise-deterministic Markov process (PDMP) samplers constitute a state-of-the-art Markov chain Monte Carlo paradigm in Bayesian computation, with examples including the zig-zag and bouncy particle sampler (bps). Recent work on the zig-zag has indicated its connection to Hamiltonian Monte Carlo (HMC), a version of the Metropolis algorithm that exploits Hamiltonian dynamics. Here we establish that, in fact, the connection between the two paradigms extends far beyond the specific instance. The key lies in (1) the fact that any time-reversible deterministic dynamics provides a valid Metropolis proposal and (2) how PDMPs' characteristic velocity changes constitute an alternative to the usual acceptance-rejection. We turn this observation into a rigorous framework for constructing rejection-free Metropolis proposals based on bouncy Hamiltonian dynamics which simultaneously possess Hamiltonian-like properties and generate discontinuous trajectories similar in appearance to PDMPs. When combined with periodic refreshment of the inertia, the dynamics converge strongly to PDMP equivalents in the limit of increasingly frequent refreshment. We demonstrate the practical implications of this new framework with a sampler based on a bouncy Hamiltonian dynamics closely related to the bps. The resulting sampler exhibits competitive performance on challenging real-data posteriors involving tens of thousands of parameters. As the sampler of choice in modern probabilistic programming languages, HMC plays a critical role in applied Bayesian modeling; by generalizing the paradigm and elucidating its connection to the leading competitor, our framework opens up opportunities for cross-pollination and innovation to further scale Bayesian inference.

MCMC using $\textit{bouncy}$ Hamiltonian dynamics: A unifying framework for Hamiltonian Monte Carlo and piecewise deterministic Markov process samplers

TL;DR

This work establishes that, in fact, the connection between the two paradigms extends far beyond the specific instance of PDMPs, and turns this observation into a rigorous framework for constructing rejection-free Metropolis proposals based on bouncy Hamiltonian dynamics.

Abstract

Piecewise-deterministic Markov process (PDMP) samplers constitute a state-of-the-art Markov chain Monte Carlo paradigm in Bayesian computation, with examples including the zig-zag and bouncy particle sampler (bps). Recent work on the zig-zag has indicated its connection to Hamiltonian Monte Carlo (HMC), a version of the Metropolis algorithm that exploits Hamiltonian dynamics. Here we establish that, in fact, the connection between the two paradigms extends far beyond the specific instance. The key lies in (1) the fact that any time-reversible deterministic dynamics provides a valid Metropolis proposal and (2) how PDMPs' characteristic velocity changes constitute an alternative to the usual acceptance-rejection. We turn this observation into a rigorous framework for constructing rejection-free Metropolis proposals based on bouncy Hamiltonian dynamics which simultaneously possess Hamiltonian-like properties and generate discontinuous trajectories similar in appearance to PDMPs. When combined with periodic refreshment of the inertia, the dynamics converge strongly to PDMP equivalents in the limit of increasingly frequent refreshment. We demonstrate the practical implications of this new framework with a sampler based on a bouncy Hamiltonian dynamics closely related to the bps. The resulting sampler exhibits competitive performance on challenging real-data posteriors involving tens of thousands of parameters. As the sampler of choice in modern probabilistic programming languages, HMC plays a critical role in applied Bayesian modeling; by generalizing the paradigm and elucidating its connection to the leading competitor, our framework opens up opportunities for cross-pollination and innovation to further scale Bayesian inference.
Paper Structure (17 sections, 4 theorems, 30 equations, 3 figures, 2 tables, 3 algorithms)

This paper contains 17 sections, 4 theorems, 30 equations, 3 figures, 2 tables, 3 algorithms.

Key Result

Theorem 3.1

Assume that $U_\mathrm{sur}$ is twice continuously differentiable and the set $\{(x, \hbox{[}1.4]{$ι$}): \nabla U_\mathrm{dif}(x) = 0, \ \hbox{[}1.4]{$ι$}=0\} \cup \{(x, v, \hbox{[}1.4]{$ι$}): v^\text{$\intercal$}\nabla U_\mathrm{dif}(x) = 0, \ \hbox{[}1.4]{$ι$}=0\}$ consists of smooth manifolds o

Figures (3)

  • Figure 1: Illustration of proposal generations through surrogate dynamics. Here we consider the use of a surrogate $U_\mathrm{sur}(x) = x_1^2/2 + x_2^2/2$ to sample from the target $U_\mathrm{tar}(x) = - \log \pi(x) = x_1^2/2 + 9x_2^2/2$. The red curve shows a trajectory of surrogate dynamics, the orange lines $U_\mathrm{sur}$'s equipotential contours, and the light blue lines $U_\mathrm{tar}$'s contours. The grey dotted lines show the contours of $U_\mathrm{dif}= U_\mathrm{tar} - U_\mathrm{sur} = 8x_2^2/2$. While the red trajectory's end point constitutes a valid Metropolis proposal, it lies in a low probability region of $U_\mathrm{tar}$ and is likely to be rejected. Our bouncy dynamics (dark blue), to be defined in Section \ref{['sec:general']}, yields a trajectory that deterministically reflects against hyperplanes orthogonal to $-\nabla U_\mathrm{dif}$ and thereby compensates for the discrepancy between $U_\mathrm{sur}$ and $U_\mathrm{tar}$. These deterministic reflections keep the trajectory in high probability regions and ensure its end point to be rejection-free. The discontinuous change in velocity occurs deterministically as an integral part of the bouncy dynamics and, in particular, is distinct from the velocity refreshments commonly used in Hmc and Pdmp to ensure ergodicity.
  • Figure 2: Trajectories of the Hbps dynamics (solid red) and the Bps's dynamics (dashed blue) from the same initial position and velocity on a correlated bivariate Gaussian target.
  • Figure 3: Comparison, based on the sparse logistic model, of the no-U-turn (red dashed) and manually-tuned (red solid) Hbps against the Bps (blue points) under the five best travel time parameters and the grid of refresh rates. The presented efficiency values are relative to the best-performing Bps. This optimally-tuned Bps identified here are re-run five times for the final benchmarking result as shown in Table \ref{['tab:1']}; having been estimated from the separate runs, the relative efficiency of Hbps reported there differs slightly from those reported here.

Theorems & Definitions (4)

  • Theorem 3.1
  • Theorem 3.2: Weak Convergence
  • Theorem 3.3: Strong Convergence
  • Theorem 4.1