Table of Contents
Fetching ...

The No-Underrun Sampler: A Locally-Adaptive, Gradient-Free MCMC Method

Nawaf Bou-Rabee, Bob Carpenter, Sifan Liu, Stefan Oberdörster

TL;DR

The No-Underrun Sampler (NURS) tackles the challenge of gradient-free MCMC for multi-scale targets by blending No-U-Turn-inspired orbit exploration with Hit-and-Run, while introducing a No-Underrun stopping condition and a gradient-free, lattice-based sampling along random directions. It proves fundamental properties including reversibility with respect to the target density $\mu$, a Wasserstein contraction bound for Gaussian targets, and quantified overlap with Hit-and-Run via a TV bound between kernels. The paper further analyzes NURS in Neal's funnel, deriving tuning guidelines and demonstrating how local adaptation through orbit construction can yield favorable scaling relative to Random Walk Metropolis, particularly in the funnel's mouth where large moves are advantageous. Collectively, these results establish NURS as a practical, theoretically-grounded gradient-free alternative for sampling in challenging, multi-scale settings, with potential for parallelization and adaptivity enhancements in future work.

Abstract

In this work, we introduce the No-Underrun Sampler (NURS), a locally-adaptive, gradient-free Markov chain Monte Carlo method that blends ideas from Hit-and-Run and the No-U-Turn Sampler. NURS dynamically adapts to the local scale of the target distribution without requiring gradient evaluations, making it especially suitable for applications where gradients are unavailable or costly. We establish key theoretical properties, including reversibility, formal connections to Hit-and-Run and Random Walk Metropolis, Wasserstein contraction comparable to Hit-and-Run in Gaussian targets, and bounds on the total variation distance between the transition kernels of Hit-and-Run and NURS. Empirical experiments, supported by theoretical insights, illustrate the ability of NURS to sample from Neal's funnel, a challenging multi-scale distribution from Bayesian hierarchical inference.

The No-Underrun Sampler: A Locally-Adaptive, Gradient-Free MCMC Method

TL;DR

The No-Underrun Sampler (NURS) tackles the challenge of gradient-free MCMC for multi-scale targets by blending No-U-Turn-inspired orbit exploration with Hit-and-Run, while introducing a No-Underrun stopping condition and a gradient-free, lattice-based sampling along random directions. It proves fundamental properties including reversibility with respect to the target density , a Wasserstein contraction bound for Gaussian targets, and quantified overlap with Hit-and-Run via a TV bound between kernels. The paper further analyzes NURS in Neal's funnel, deriving tuning guidelines and demonstrating how local adaptation through orbit construction can yield favorable scaling relative to Random Walk Metropolis, particularly in the funnel's mouth where large moves are advantageous. Collectively, these results establish NURS as a practical, theoretically-grounded gradient-free alternative for sampling in challenging, multi-scale settings, with potential for parallelization and adaptivity enhancements in future work.

Abstract

In this work, we introduce the No-Underrun Sampler (NURS), a locally-adaptive, gradient-free Markov chain Monte Carlo method that blends ideas from Hit-and-Run and the No-U-Turn Sampler. NURS dynamically adapts to the local scale of the target distribution without requiring gradient evaluations, making it especially suitable for applications where gradients are unavailable or costly. We establish key theoretical properties, including reversibility, formal connections to Hit-and-Run and Random Walk Metropolis, Wasserstein contraction comparable to Hit-and-Run in Gaussian targets, and bounds on the total variation distance between the transition kernels of Hit-and-Run and NURS. Empirical experiments, supported by theoretical insights, illustrate the ability of NURS to sample from Neal's funnel, a challenging multi-scale distribution from Bayesian hierarchical inference.

Paper Structure

This paper contains 19 sections, 7 theorems, 80 equations, 23 figures, 1 table, 6 algorithms.

Key Result

Theorem 1

The transition kernel $\pi_{\mathrm{NURS}}$ is reversible with respect to the target $\mu$.

Figures (23)

  • Figure 1: Starting from state $\theta$, the slice sampler first samples a height $y$ uniformly from $[0,\mu(\theta)]$, as shown in (a). It then samples a new state uniformly from the slice of state space where the (non-normalized) density $\mu$ exceeds this height, the gray-shaded region in (b).
  • Figure 2: Starting from a state $\theta$, Neal's doubling procedure begins by uniformly at random selecting an interval of size $w$ that contains $\theta$. The interval is then recursively doubled, expanding either to the left or right with equal probability. This doubling continues until both endpoints lie outside the slice, represented by the two bold line segments.
  • Figure 3: A transition of NURS from an initial state $\theta\in\mathbb R^2$ in the target shown in Figure \ref{['fig:slice_sampler']}. NURS first samples a direction $\rho$ from the unit sphere, defining the line along which the orbit will be built. The initial state is then randomized by applying a Metropolis-adjusted random shift $s\in[-h/2,\,h/2)$ along the line. Starting from the shifted state, NURS iteratively builds an orbit: a lattice with spacing $h$ on the line. The orbit is doubled at each step either forward or backward with equal probability. The doubling continues until the No-Underrun condition is met, indicating the orbit spans most of the target restricted to the line. From the selected orbit, the next state $\theta'$ of NURS is sampled according to the categorical distribution \ref{['eq:catorbit']}, with probabilities proportional to the target's density evaluated at the orbit points.
  • Figure 4: Transition step of generalized Hit-and-Run from state $\theta$.
  • Figure 5: The No-Underrun condition \ref{['eq:no-Underrun_cont']} ensures that for exponentially-tailed distributions with barriers exceeding $\epsilon$, the interval $(a,b)$ contains the bulk of the distribution, with the total probability mass outside $(a,b)$ being $O(\epsilon)$.
  • ...and 18 more figures

Theorems & Definitions (13)

  • Theorem 1
  • Lemma 1
  • proof
  • Theorem 2
  • Lemma 2
  • proof : Proof of Theorem \ref{['thm:HRcontr']}
  • Theorem 3
  • proof
  • Lemma 3
  • proof
  • ...and 3 more