Table of Contents
Fetching ...

Resonances in reflective Hamiltonian Monte Carlo

Namu Kroupa, Gábor Csányi, Will Handley

TL;DR

This work investigates slow mixing in reflective Hamiltonian Monte Carlo with inexact reflections when the particle ensemble is initialised from a Dirac delta and targets a uniform distribution in high dimensions. It introduces Sinkhorn divergences to quantify non-uniformity and analyzes dynamics on the unit sphere and cube, revealing resonances and a fluid-to-discretisation transition with a dimension-dependent critical step size. A lossless two-dimensional representation and low-dimensional toy models capture the dominant features, and a spectral analysis shows a dominant supersonic frequency alongside dispersion effects. The findings have practical implications for tuning in nested sampling and motivate preconditioning and diagnostics to mitigate resonances in high-dimensional reflective-HMC implementations.

Abstract

In high dimensions, reflective Hamiltonian Monte Carlo with inexact reflections exhibits slow mixing when the particle ensemble is initialised from a Dirac delta distribution and the uniform distribution is targeted. By quantifying the instantaneous non-uniformity of the distribution with the Sinkhorn divergence, we elucidate the principal mechanisms underlying the mixing problems. In spheres and cubes, we show that the collective motion transitions between fluid-like and discretisation-dominated behaviour, with the critical step size scaling as a power law in the dimension. In both regimes, the particles can spontaneously unmix, leading to resonances in the particle density and the aforementioned problems. Additionally, low-dimensional toy models of the dynamics are constructed which reproduce the dominant features of the high-dimensional problem. Finally, the dynamics is contrasted with the exact Hamiltonian particle flow and tuning practices are discussed.

Resonances in reflective Hamiltonian Monte Carlo

TL;DR

This work investigates slow mixing in reflective Hamiltonian Monte Carlo with inexact reflections when the particle ensemble is initialised from a Dirac delta and targets a uniform distribution in high dimensions. It introduces Sinkhorn divergences to quantify non-uniformity and analyzes dynamics on the unit sphere and cube, revealing resonances and a fluid-to-discretisation transition with a dimension-dependent critical step size. A lossless two-dimensional representation and low-dimensional toy models capture the dominant features, and a spectral analysis shows a dominant supersonic frequency alongside dispersion effects. The findings have practical implications for tuning in nested sampling and motivate preconditioning and diagnostics to mitigate resonances in high-dimensional reflective-HMC implementations.

Abstract

In high dimensions, reflective Hamiltonian Monte Carlo with inexact reflections exhibits slow mixing when the particle ensemble is initialised from a Dirac delta distribution and the uniform distribution is targeted. By quantifying the instantaneous non-uniformity of the distribution with the Sinkhorn divergence, we elucidate the principal mechanisms underlying the mixing problems. In spheres and cubes, we show that the collective motion transitions between fluid-like and discretisation-dominated behaviour, with the critical step size scaling as a power law in the dimension. In both regimes, the particles can spontaneously unmix, leading to resonances in the particle density and the aforementioned problems. Additionally, low-dimensional toy models of the dynamics are constructed which reproduce the dominant features of the high-dimensional problem. Finally, the dynamics is contrasted with the exact Hamiltonian particle flow and tuning practices are discussed.

Paper Structure

This paper contains 20 sections, 25 equations, 14 figures.

Figures (14)

  • Figure 1: Possible moves of the GMC Markov Chain. A particle starting at $\mathbf{q}(t)$ with momentum $\mathbf{p}(t)$ inside a volume $\mathcal{U}$ attempts to move forward (1), reflect (2) or reverse its direction (3). Since the motion is discretised, the particle will overstep the boundary of $\mathcal{U}$ so that the normal vector $\mathbf{\hat{n}}(\mathbf{q})$ on the outside is used for reflections. This necessitates the definition of $\mathbf{\hat{n}}(\mathbf{q})$ as a vector field outside $\mathcal{U}$, which we define by considering scaled copies of the boundary (dotted line).
  • Figure 2: Sinkhorn divergence (SD) of an ensemble of Markov Chains in the sphere (left) and the cube (right) in $n\xspace=100$ dimensions, measured with respect to the uniform distribution. Time is measured in Monte Carlo steps (MCS). Initialised from a randomly chosen point, $10^3$ particles are evolved under dynamics with momenta drawn independently from a multivariate Gaussian $\mathcal{N}$ with standard deviation $\sigma_p$, commonly known as the step size of the algorithm. Every $L=400\,\mathrm{MCS}$, the momenta are re-randomised. Within a single trajectory, i.e. up to the momentum re-randomisation, the particle ensemble converges to a subspace of the full volume so that the stagnates on a timescale given by the dispersion the particle distribution. The observed oscillations are caused by temporary bunching (resonances) of the particles, both due to the inexact reflections present in the dynamics and concentration phenomena in high dimensions. Momentum re-randomisation effectively restarts the algorithm, leading to an approximately repeating pattern in the , which converges to zero on longer timescales since the stationary distribution of the Markov Chain is uniform in the volume.
  • Figure 3: (a) Action of the rotation map $\mathbf{R}_i$ (Equation \ref{['eqn:disk-map']}). In the sphere $S^{n-1}$, a trajectory lies in a two-dimensional disc spanned by the initial position $\mathbf{q}_i(0)$ and momentum $\mathbf{p}_i(0)$. All particles have different momenta but are initialised from the same point and hence share the axis $\mathbf{q}_i(0)$. Their individual discs can therefore be rotated onto the $\mathbf{q}_1$-$\mathbf{q}_2$-plane. This maps the angle $\psi$ to zero but preserves $\theta$. (b) Image of the rotation map. The radial and angular distributions of the initial momentum concentrate around $\sigma_p\sqrt{n\xspace}\pm\frac{\sigma_p}{\sqrt{2}}$ and $\pm\frac{\pi}{2}\pm(n\xspace-2)^{-1/2}$, respectively. To first order in $\frac{1}{n\xspace}$, the distance of the initial position is $1-\frac{2}{n\xspace}$ from the origin. The dominant motion in high dimensions therefore approximately follows a path close to the boundary, converging at an antipodal point.
  • Figure 4: Angular and radial particle density for different times $t$ in the sphere for $n\xspace=2\times 10^3$ and $\sigma_p=\tilde{\sigma}_p\sqrt{\frac{100}{n\xspace}}$. The angle $\theta$ is defined through the rotation map (Equation \ref{['eqn:disk-map']}). The dispersion and the splitting of the particle distribution is visible for $\tilde{\sigma}_p=8\times 10^{-3}$ at $t=10\,\mathrm{MCS}\xspace$, with the faster particles travelling at the supersonic velocity defined in Equation \ref{['eqn:supersonic-frequency']} and the slower particles in a smaller wave packet behind them. At $t=20\,\mathrm{MCS}\xspace$, the wave packet merges at $\theta=\pi$ with the one travelling in the opposite direction. For $\tilde{\sigma}_p=24\times 10^{-3}$, the wave packet travels faster, however this motion appears slower due to aliasing. The radial distribution is effectively fixed at the initial radius, which is expected from the concentration of the angular distribution $p(\phi)\propto|\sin\phi|^{n\xspace-2}$. The arrow denotes a delta function and the percentage is the fraction of particles at the delta function. For comparison, the dotted straight line is the radial distribution $p(r)=n\xspace r^{n\xspace-1}$ of the uniform distribution in the sphere on a log-log axis. There are no particles deeper inside the sphere than shown. This creates an under-density compared to the uniform distribution or, equivalently, an over-density near the boundary.
  • Figure 5: Power spectral density $\mathrm{PSD}(f)$ of the Sinkhorn divergence against the standard deviation $\sigma_p$ of the momentum distribution in the sphere in $n\xspace=100$ dimensions. Increasing $\sigma_p$ increases the average momentum magnitude of the particles, leading to higher-frequency oscillations in the particle density up to the Nyquist frequency $f_\mathrm{N}=0.5\,\mathrm{MCS}^{-1}$ when $\sigma_p\sqrt{n}\sim 1$. The dominant frequency $f_\mathrm{super}>f_\mathrm{diag}$ indicates that most particles travel with larger effective momentum close to the boundary. Dispersion of the particle wave front leads to broadening of the $\mathrm{PSD}$ on a scale of $f_\mathrm{broad}$.
  • ...and 9 more figures