Table of Contents
Fetching ...

Mixing of the No-U-Turn Sampler and the Geometry of Gaussian Concentration

Nawaf Bou-Rabee, Stefan Oberdörster

Abstract

We prove that the mixing time of the No-U-Turn Sampler (NUTS), when initialized in the concentration region of the canonical Gaussian measure, scales as $d^{1/4}$, up to logarithmic factors, where $d$ is the dimension. This scaling is expected to be sharp. This result is based on a coupling argument that leverages the geometric structure of the target distribution. Specifically, concentration of measure results in a striking uniformity in NUTS' locally adapted transitions, which holds with high probability. This uniformity is formalized by interpreting NUTS as an accept/reject Markov chain, where the mixing properties for the more uniform accept chain are analytically tractable. Additionally, our analysis uncovers a previously unnoticed issue with the path length adaptation procedure of NUTS, specifically related to looping behavior, which we address in detail.

Mixing of the No-U-Turn Sampler and the Geometry of Gaussian Concentration

Abstract

We prove that the mixing time of the No-U-Turn Sampler (NUTS), when initialized in the concentration region of the canonical Gaussian measure, scales as , up to logarithmic factors, where is the dimension. This scaling is expected to be sharp. This result is based on a coupling argument that leverages the geometric structure of the target distribution. Specifically, concentration of measure results in a striking uniformity in NUTS' locally adapted transitions, which holds with high probability. This uniformity is formalized by interpreting NUTS as an accept/reject Markov chain, where the mixing properties for the more uniform accept chain are analytically tractable. Additionally, our analysis uncovers a previously unnoticed issue with the path length adaptation procedure of NUTS, specifically related to looping behavior, which we address in detail.

Paper Structure

This paper contains 29 sections, 11 theorems, 88 equations, 15 figures, 3 algorithms.

Key Result

Theorem 1

The transition kernel $\pi_{\mathrm{NUTS}}$ is reversible with respect to the target distribution $\mu$.

Figures (15)

  • Figure 1: In each transition step, NUTS iteratively builds a leapfrog orbit starting from the initial state, labeled as $0$. At each iteration, the orbit is doubled either forward or backward in time with equal probability, continuing until a U-turn is detected. Once a U-turn occurs, the next state of the NUTS chain is stochastically selected from the final orbit with sampling probabilities determined by the energies of the corresponding leapfrog iterates, as described in Algorithm \ref{['algo:NUTS']}.
  • Figure 2: Two orbits on $\sqrt d\,\mathcal{S}^{d-1}$ following the $2\pi$-periodic exact Hamiltonian flow: The left orbit of path length in $[0,\pi]$ does not exhibit the U-turn property \ref{['eq:u-turn']}, while the right orbit, with a path length in $(\pi,2\pi)$, does. In this idealized setting (starting on the $(d-1)$-sphere with tangential velocity of correct magnitude following the exact flow), the U-turn property of an orbit depends only on the orbit's path length and is uniform in the initial position $x$, consistent with the rotational symmetry of the sphere. In the realistic setting, where $x\in D_\alpha$, $v\sim\gamma$ and the leapfrog integrator is used, local effects emerge, as shown in Figure \ref{['fig:sin']}.
  • Figure 3: This figure illustrates \ref{['eq:u-turn_disc']}, which shows that the dot products in the U-turn property \ref{['eq:u-turn']} are within $O(\delta)$ of a sine function (gray). The deviations are in particular due to local effects. For orbit lengths (in physical time) within the highlighted intervals, the deviations do not affect the signs of the dot products, implying the U-turn property is independent of local effects and thus uniform in position $x$. This implies that the orbit selection consistently finds orbits of uniform length, as described in \ref{['eq:chosenOL_disc']}.
  • Figure 6: This figure illustrates a potential solution to NUTS' looping issue in dimension $d=10^4$ using $50$ independent realizations each started from stationarity. Orbit lengths are plotted against NUTS' iteration number. (a) uses a fixed leapfrog step size of $h=0.1$ where condition \ref{['eq:thm_h_rest']} is not met, as in Figure \ref{['fig:orbitfig']} (b). (b) applies step-size randomization at each NUTS transition step. (c) applies step-size randomization at each leapfrog integration step. (Figure courtesy of and used with permission from Tore Selland Kleppe.)
  • Figure 7: For an index set $I$ with $|I|=6$, the Multinoulli distribution $\mathrm{Multinoulli}(a_i)_{i\in I}$ with normalized weights $\sum_{i\in I}a_i=1$ is split into its maximal uniform part $|I|\min_{i\in I}a_i\,\mathrm{Unif}(I)$ (shown in gray) and the remaining part $(1-|I|\min_{i\in I}a_i)\,\mathrm{Multinoulli}(a_i-\min_{i\in I}a_i)_{i\in I}$ (shown in red).
  • ...and 10 more figures

Theorems & Definitions (21)

  • Theorem 1
  • Theorem 2: Main Result
  • Remark 1: Mixing Time Guarantee
  • Theorem 3
  • Lemma 1
  • proof : Proof of Lemma \ref{['lem:E']}
  • Lemma 2
  • proof : Proof of Lemma \ref{['lem:u-turn']}
  • Lemma 3
  • proof
  • ...and 11 more