Table of Contents
Fetching ...

Sampling From Multiscale Densities With Delayed Rejection Generalized Hamiltonian Monte Carlo

Gilad Turok, Chirag Modi, Bob Carpenter

TL;DR

DR-G-HMC tackles sampling from hierarchical models with multiscale geometry by integrating delayed rejection into generalized HMC and employing dynamic, geometrically decreasing step sizes per iteration. This design mitigates trajectory reversals and enables large steps in flat regions while using small steps where curvature is high, yielding robust performance on Neal's funnel and various posterior densities. Empirical results show DR-G-HMC matches or exceeds the accuracy of NUTS while outperforming DR-HMC, particularly in multiscale settings, and exhibits insensitivity to several tuning parameters. The approach promises practical gains for Bayesian inference in complex models by combining efficiency, robustness, and compatibility with standard posterior targets.

Abstract

Hamiltonian Monte Carlo (HMC) is the mainstay of applied Bayesian inference for differentiable models. However, HMC still struggles to sample from hierarchical models that induce densities with multiscale geometry: a large step size is needed to efficiently explore low curvature regions while a small step size is needed to accurately explore high curvature regions. We introduce the delayed rejection generalized HMC (DR-G-HMC) sampler that overcomes this challenge by employing dynamic step size selection, inspired by differential equation solvers. In generalized HMC, each iteration does a single leapfrog step. DR-G-HMC sequentially makes proposals with geometrically decreasing step sizes upon rejection of earlier proposals. This simulates Hamiltonian dynamics that can adjust its step size along a (stochastic) Hamiltonian trajectory to deal with regions of high curvature. DR-G-HMC makes generalized HMC competitive by decreasing the number of rejections which otherwise cause inefficient backtracking and prevents directed movement. We present experiments to demonstrate that DR-G-HMC (1) correctly samples from multiscale densities, (2) makes generalized HMC methods competitive with the state of the art No-U-Turn sampler, and (3) is robust to tuning parameters.

Sampling From Multiscale Densities With Delayed Rejection Generalized Hamiltonian Monte Carlo

TL;DR

DR-G-HMC tackles sampling from hierarchical models with multiscale geometry by integrating delayed rejection into generalized HMC and employing dynamic, geometrically decreasing step sizes per iteration. This design mitigates trajectory reversals and enables large steps in flat regions while using small steps where curvature is high, yielding robust performance on Neal's funnel and various posterior densities. Empirical results show DR-G-HMC matches or exceeds the accuracy of NUTS while outperforming DR-HMC, particularly in multiscale settings, and exhibits insensitivity to several tuning parameters. The approach promises practical gains for Bayesian inference in complex models by combining efficiency, robustness, and compatibility with standard posterior targets.

Abstract

Hamiltonian Monte Carlo (HMC) is the mainstay of applied Bayesian inference for differentiable models. However, HMC still struggles to sample from hierarchical models that induce densities with multiscale geometry: a large step size is needed to efficiently explore low curvature regions while a small step size is needed to accurately explore high curvature regions. We introduce the delayed rejection generalized HMC (DR-G-HMC) sampler that overcomes this challenge by employing dynamic step size selection, inspired by differential equation solvers. In generalized HMC, each iteration does a single leapfrog step. DR-G-HMC sequentially makes proposals with geometrically decreasing step sizes upon rejection of earlier proposals. This simulates Hamiltonian dynamics that can adjust its step size along a (stochastic) Hamiltonian trajectory to deal with regions of high curvature. DR-G-HMC makes generalized HMC competitive by decreasing the number of rejections which otherwise cause inefficient backtracking and prevents directed movement. We present experiments to demonstrate that DR-G-HMC (1) correctly samples from multiscale densities, (2) makes generalized HMC methods competitive with the state of the art No-U-Turn sampler, and (3) is robust to tuning parameters.
Paper Structure (42 sections, 6 theorems, 41 equations, 15 figures, 2 tables)

This paper contains 42 sections, 6 theorems, 41 equations, 15 figures, 2 tables.

Key Result

Proposition 1

If an AC transition kernel maintains detailed balance, it satisfies $\pi$-invariance by substituting eq:detailed_balance into eq:invariance and applying eq:normalization. (Non-AC transition kernels also maintain $\pi$-invariance but the more complex proof is omitted here.)

Figures (15)

  • Figure 1: (a) Neal's funnel exhibits multiscale geometry. Neal's funnel is challenging to sample from because its (negative) log density and Hessian condition number vary by orders of magnitude throughout the space. (b) DR-G-HMC handles multiscale geometry with dynamic step sizes. To sample from Neal's funnel, DR-G-HMC uses large step sizes in low-curvature regions ($x \gg 0$) and small step sizes in high curvature regions ($x \ll 0$). See \ref{['app:funnel_fig_details']} for figure details.
  • Figure 2: Average error vs gradient evaluations for Neal's funnel. Error in mean ($\mathcal{L}_{\theta, T}$) and variance ($\mathcal{L}_{\theta^2, T}$) averaged over $100$ chains of sampler draws from $10D$ Neal's funnel. NUTS's error plateaus while delayed rejection methods do not. Our method DR-G-HMC achieves the lowest error.
  • Figure 3: Histogram of log scale parameter $x$ in Neal's funnel. Sampler draws are aggregated across all $100$ chains of $10D$ Neal's funnel. Reference draws are sampled from the known density as $x \sim \textrm{normal}(0,3)$. DR-G-HMC and DR-HMC can sample deep into the highly curved neck ($x \ll 0$) and DR-G-HMC can sample deep into the mouth ($x \gg 0$) with dynamic step size selection, while NUTS cannot.
  • Figure 4: DR-G-HMC overcomes the inefficiencies of generalized HMC. Error in mean ($\mathcal{L}_{\theta,T}$) and variance ($\mathcal{L}_{\theta^2,T}$) is shown on the log scale for $100$ chains. Visual elements represent the following: dashed black line is the mean, solid gray line is the median, colored box is the $(25, 75)$th percentile, whiskers are $1.5$ times the inter-quartile range, and bubbles are outliers.
  • Figure 5: DR-G-HMC is robust to the damping tuning parameter $\gamma$. Error in mean ($\mathcal{L}_{\theta,T}$) and variance ($\mathcal{L}_{\theta^2,T}$) is shown for $100$ chains of various posterior densities. Visual elements represent the following: dashed black line is the mean, solid gray line is the median, colored box is the $(25, 75)$th percentile, whiskers are $1.5$ times the inter-quartile range, and bubbles are outliers.
  • ...and 10 more figures

Theorems & Definitions (16)

  • Definition 1: Transition kernel normalization
  • Definition 2: Invariance
  • Definition 3: Detailed balance
  • Proposition 1
  • Definition 4: Volume-Preserving
  • Definition 5: Involution
  • Definition 6: Shear
  • Proposition 2
  • proof
  • Lemma 1
  • ...and 6 more