Table of Contents
Fetching ...

Exponential speed up in Monte Carlo sampling through Radial Updates

Johann Ostmeyer

TL;DR

This work introduces a general radial-update framework for MCMC that augments any angular sampler with a radial move driven by a substitution $r=f(z)$ so that the effective potential $V_{ ext{eff}}(z,\theta)$ grows exponentially in $|z|$, ensuring exponential convergence on non-compact spaces. By operating the radial update in the auxiliary variable $z$ with additive Gaussian steps and an appropriate substitution (often $r=e^{z}$ for polynomial tails), the method achieves near-optimal autocorrelation and dramatically speeds sampling of heavy-tailed distributions. The authors prove convergence under a broad set of drift conditions, classify convergence rates, and provide practical tuning rules, notably $\sigma \approx \sqrt{\frac{2}{a d}}$ for polynomial potentials, with $d$ the dimension, and target acceptance rates around 0.42–0.5. The approach is demonstrated through numerical examples and is shown to be compatible with HMC, yielding orders-of-magnitude speedups for challenging targets, thus offering a powerful, versatile tool for efficient MCMC on non-compact spaces.

Abstract

Recently, it has been shown that the hybrid Monte Carlo (HMC) algorithm is guaranteed to converge exponentially to a given target probability distribution $p(x)\propto e^{-V(x)}$ on non-compact spaces if augmented by an appropriate radial update. In this work we present a simple way to derive efficient radial updates meeting the necessary requirements for any potential $V$. We reduce the problem to finding a substitution for the radial direction $||x||=f(z)$ so that the effective potential $V(f(z))$ grows exponentially with $z\rightarrow\pm\infty$. Any additive update of $z$ then leads to the desired convergence. We show that choosing this update from a normal distribution with standard deviation $σ\approx 1/\sqrt{d}$ in $d$ dimensions yields very good results. We further generalise the previous results on radial updates to a wide class of Markov chain Monte Carlo (MCMC) algorithms beyond the HMC and we quantify the convergence behaviour of MCMC algorithms with badly chosen radial update. Finally, we apply the radial update to the sampling of heavy-tailed distributions and achieve a speed up of many orders of magnitude.

Exponential speed up in Monte Carlo sampling through Radial Updates

TL;DR

This work introduces a general radial-update framework for MCMC that augments any angular sampler with a radial move driven by a substitution so that the effective potential grows exponentially in , ensuring exponential convergence on non-compact spaces. By operating the radial update in the auxiliary variable with additive Gaussian steps and an appropriate substitution (often for polynomial tails), the method achieves near-optimal autocorrelation and dramatically speeds sampling of heavy-tailed distributions. The authors prove convergence under a broad set of drift conditions, classify convergence rates, and provide practical tuning rules, notably for polynomial potentials, with the dimension, and target acceptance rates around 0.42–0.5. The approach is demonstrated through numerical examples and is shown to be compatible with HMC, yielding orders-of-magnitude speedups for challenging targets, thus offering a powerful, versatile tool for efficient MCMC on non-compact spaces.

Abstract

Recently, it has been shown that the hybrid Monte Carlo (HMC) algorithm is guaranteed to converge exponentially to a given target probability distribution on non-compact spaces if augmented by an appropriate radial update. In this work we present a simple way to derive efficient radial updates meeting the necessary requirements for any potential . We reduce the problem to finding a substitution for the radial direction so that the effective potential grows exponentially with . Any additive update of then leads to the desired convergence. We show that choosing this update from a normal distribution with standard deviation in dimensions yields very good results. We further generalise the previous results on radial updates to a wide class of Markov chain Monte Carlo (MCMC) algorithms beyond the HMC and we quantify the convergence behaviour of MCMC algorithms with badly chosen radial update. Finally, we apply the radial update to the sampling of heavy-tailed distributions and achieve a speed up of many orders of magnitude.

Paper Structure

This paper contains 11 sections, 22 theorems, 59 equations, 5 figures, 1 table.

Key Result

Proposition 3.4

An def:markov_kernel$\mathcal{P}$ defined by $x\mapsto x + \eta$ with $\eta\sim \mathcal{N}(0,\Sigma)$ sampled from a multivariate normal distribution satisfies the def:compact_doeblin.

Figures (5)

  • Figure 1: Histograms of the Markov chains ($10^5$ steps) generated using \ref{['alg:poly_pot', 'alg:any_pot']}, respectively, based on \ref{['th:dbl_exp_subst', 'th:updates_in_practice', 'th:scaling_sigma']}. The potential $V(x)$ depends only on the radius $|x|\equiv r=f(z)$ and the angular component was not sampled for simplicity. In both cases $z\in \mathds{R}$ was updated by $z\rightarrow z+\gamma$ with $\gamma\sim\mathcal{N}(0,\frac{2}{d})$. Left: $d=100$ dimensions, $V(x)=|x|$ (i.e. $p(r)\propto r^{d-1}\mathrm{e}^{-r}$) with the substitution $r=\mathrm{e}^{z}$ and \ref{['def:eff_pot']}$V_\text{eff}(z)=\mathrm{e}^{z}-dz$; Right: $d=1$ dimension, $V(x)=\ln\left(1+|x|^{1.01}\right)$ (i.e. $p(r)\propto\frac{1}{1+r^{1.01}}$) with the substitution $r=\mathrm{e}^{\sinh z}$ and $V_\text{eff}(z)=\ln\left(1+\mathrm{e}^{1.01\sinh z}\right)-\sinh z - \ln\cosh z$.
  • Figure 2: Time series (left) and histogram (right) of the Markov chains ($3.0e5$ steps) generated using additive updates $r\rightarrow r+\gamma$ with $\gamma\sim\mathcal{N}(0,2)$ without appropriate \ref{['def:radial_update']}. The radial potential $V(r)=\ln\left(1+r^{1.1}\right)$ (i.e. $p(r)\propto\frac{1}{1+r^{1.1}}$) of the radius $r\ge 0$ was considered in $d=1$ dimension. The different time series correspond to different random number seeds.
  • Figure 3: Convergence of the Lyapunov function Lyapounov1907Meyn_Tweedie_1992meyn1993markov (equivalent to the potential) for different potentials in $d=1$ dimension. In all cases the update $z\rightarrow z+\gamma$ with $\gamma\sim \mathcal{N}(0,1)$ has been used. The different time series in every panel correspond to different random number seeds. The potentials $V(z)=\cosh(z)|z|\sqrt{|z|} \text{ and } \ln\left(1+z^2\right)$ in the $\text{top left}\text{top right}\text{bottom}$ panels grow $\text{exponentially}\text{linearly}\text{sub-linearly}$ and thus satisfy the $\ref{['def:sgdc']}\ref{['def:sadc']}\ref{['def:ladc']}$ leading to an $\text{exponential}\text{linear}\text{diffusive}$\ref{['def:approach']} of the exponential convergence region according to \ref{['th:classification']}.
  • Figure 4: Time series (left) and autocorrelation functions (right) of Markov chains generated using \ref{['alg:poly_pot']} based on \ref{['th:dbl_exp_subst', 'th:updates_in_practice']} with different standard deviations $\sigma$ in the update of $z\rightarrow z+\gamma$ with $\gamma\sim\mathcal{N}(0,\sigma^2)$. The potential $V(x)=\frac{1}{2} |x|^2$ (i.e. $p(r)\propto r^{d-1}\mathrm{e}^{-\frac{1}{2}r^2}$) of the radius $r=\mathrm{e}^{z}$ in $d=100$ dimensions (i.e. $V_\text{eff}(z)=\frac{1}{2}\mathrm{e}^{2z}-dz$) was treated as a one-dimensional problem and the angular component was not sampled for simplicity. Prediction \ref{['eq:best_sigma']} was used for $\sigma_0=\frac{1}{10}$.
  • Figure 5: Left: \ref{['def:tau_int']} of Markov chains in the same setting as figure \ref{['fig:time_series_autocorr']} as a function of the standard deviation $\sigma$ for different dimensions $d$. The fit function has the form $\tau_\text{int} = a\sigma^{-2} + b\sigma + c$. Right: Minima $\sigma_\text{min}=\left(\frac{a}{b}\right)^{1/3}$ obtained from the fits of $\tau_\text{int}$. Prediction \ref{['eq:best_sigma']} suggests $\sigma_\text{min}=\frac{1}{\sqrt d}$ while a fit $\sigma_\text{min}=\frac{\sigma^*}{\sqrt d}$ yields $\sigma^*=1.528\pm 0.007$.

Theorems & Definitions (74)

  • Definition 3.1: Markov transition kernel hairer2008look
  • Remark
  • Definition 3.2: Metropolis-Hastings accept/reject step Metropolis:1953Hastings_1970
  • Remark
  • Definition 3.3: Compact Doeblin's condition hairer2008lookoriginal_radial_updateDoeblin:1938Doeblin:1940Doob:1948
  • Remark
  • Proposition 3.4: Gaussian update
  • proof
  • Definition 3.5: Strong geometric drift condition Harris:1955hairer2008look
  • Remark
  • ...and 64 more