Table of Contents
Fetching ...

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Matthew D. Hoffman, Andrew Gelman

TL;DR

<3-5 sentence high-level summary> The No-U-Turn Sampler (NUTS) tackles the practical bottleneck of Hamiltonian Monte Carlo (HMC) by eliminating the need to pre-specify the trajectory length L, while retaining HMC’s efficient exploration of high-dimensional posteriors. It achieves this with a recursive, binary-tree doubling trajectory-building procedure that stops when the trajectory would turn back on itself, preserving detailed balance, and with a dual averaging scheme to adapt the step size ε automatically. Empirical results show NUTS matches or surpasses tuned HMC in efficiency across several challenging models, while offering turnkey applicability suitable for automatic inference engines like Stan. The work also outlines memory-efficient implementations and discusses extensions such as mass matrix adaptations and windowed sampling for future improvements. This approach significantly broadens the practical usability of gradient-based MCMC in complex Bayesian models.}

Abstract

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size ε and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter ε on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient "turnkey" sampling algorithms.

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

TL;DR

<3-5 sentence high-level summary> The No-U-Turn Sampler (NUTS) tackles the practical bottleneck of Hamiltonian Monte Carlo (HMC) by eliminating the need to pre-specify the trajectory length L, while retaining HMC’s efficient exploration of high-dimensional posteriors. It achieves this with a recursive, binary-tree doubling trajectory-building procedure that stops when the trajectory would turn back on itself, preserving detailed balance, and with a dual averaging scheme to adapt the step size ε automatically. Empirical results show NUTS matches or surpasses tuned HMC in efficiency across several challenging models, while offering turnkey applicability suitable for automatic inference engines like Stan. The work also outlines memory-efficient implementations and discusses extensions such as mass matrix adaptations and windowed sampling for future improvements. This approach significantly broadens the practical usability of gradient-based MCMC in complex Bayesian models.}

Abstract

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size ε and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter ε on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient "turnkey" sampling algorithms.

Paper Structure

This paper contains 23 sections, 27 equations, 7 figures, 6 algorithms.

Figures (7)

  • Figure 1: Example of building a binary tree via repeated doubling. Each doubling proceeds by choosing a direction (forwards or backwards in time) uniformly at random, then simulating Hamiltonian dynamics for $2^j$ leapfrog steps in that direction, where $j$ is the number of previous doublings (and the height of the binary tree). The figures at top show a trajectory in two dimensions (with corresponding binary tree in dashed lines) as it evolves over four doublings, and the figures below show the evolution of the binary tree. In this example, the directions chosen were forward (light orange node), backward (yellow nodes), backward (blue nodes), and forward (green nodes).
  • Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipse is a contour of the target distribution, the black open circles are the positions $\theta$ traced out by the leapfrog integrator and associated with elements of the set of visited states $\mathcal{B}$, the black solid circle is the starting position, the red solid circles are positions associated with states that must be excluded from the set $\mathcal{C}$ of possible next samples because their joint probability is below the slice variable $u$, and the positions with a red "x" through them correspond to states that must be excluded from $\mathcal{C}$ to satisfy detailed balance. The blue arrow is the vector from the positions associated with the leftmost to the rightmost leaf nodes in the rightmost height-3 subtree, and the magenta arrow is the (normalized) momentum vector at the final state in the trajectory. The doubling process stops here, since the blue and magenta arrows make an angle of more than 90 degrees. The crossed-out nodes with a red "x" are in the right half-tree, and must be ignored when choosing the next sample.
  • Figure 3: Discrepancies between the realized average acceptance probability statistic $h$ and its target $\delta$ for the multivariate normal, logistic regression, hierarchical logistic regression, and stochastic volatility models. Each point's distance from the x-axis shows how effectively the dual averaging algorithm tuned the step size $\epsilon$ for a single experiment. Leftmost plots show experiments run with NUTS, other plots show experiments run with HMC with a different setting of $\epsilon L$.
  • Figure 4: Plots of the convergence of $\bar{\epsilon}$ as a function of the number of iterations of NUTS with dual averaging with $\delta=0.65$ applied to the multivariate normal (MVN), logistic regression (LR), hierarchical logistic regression (HLR), and stochastic volatility (SV) models. Each trace is from an independent run. The y-axis shows the value of $\bar{\epsilon}$, divided by one of the final values of $\bar{\epsilon}$ so that the scale of the traces for each problem can be readily compared.
  • Figure 5: Histograms of the trajectory lengths generated by NUTS with various acceptance rate targets $\delta$ for the multivariate normal (MVN), logistic regression (LR), hierarchical logistic regression (HLR), and stochastic volatility (SV) models.
  • ...and 2 more figures