Table of Contents
Fetching ...

Time-complexity of sampling from a multimodal distribution using sequential Monte Carlo

Ruiyu Han, Gautam Iyer, Dejan Slepčev

TL;DR

This work develops a rigorous analysis of Annealed Sequential Monte Carlo (ASMC) for sampling from Gibbs measures at low temperature in non-convex landscapes, using geometric annealing and Langevin steps at each level. By exploiting local valley mixing and a carefully designed resampling scheme, the authors prove that ASMC estimators converge with time complexity that is polynomial in the inverse temperature and the desired accuracy, with a distinctive fourth-power scaling in $1/eta$ and a squared dependence on $1/ ext{err}$ in certain regimes. They establish two main model frameworks: a Local Mixing Model with explicit constants and a double-well energy scenario on the torus, the latter analyzed via spectral properties of the Langevin generator to obtain polynomial bounds under nondegeneracy and mass-ratio assumptions. The paper also provides numerical experiments illustrating the level-wise mass balancing and error behavior, and surveys related tempering and AIS/SMC methods to position ASMC as a dimension-independent, structure-agnostic approach to multimodal sampling. Overall, the results suggest a practically efficient route to low-temperature sampling in multimodal settings where global mixing is prohibitively slow.

Abstract

We study a sequential Monte Carlo algorithm to sample from the Gibbs measure with a non-convex energy function at a low temperature. We use the practical and popular geometric annealing schedule, and use a Langevin diffusion at each temperature level. The Langevin diffusion only needs to run for a time that is long enough to ensure local mixing within energy valleys, which is much shorter than the time required for global mixing. Our main result shows convergence of Monte Carlo estimators with time complexity that, approximately, scales like the fourth power of the inverse temperature, and the square of the inverse allowed error. We also study this algorithm in an illustrative model scenario where more explicit estimates can be given.

Time-complexity of sampling from a multimodal distribution using sequential Monte Carlo

TL;DR

This work develops a rigorous analysis of Annealed Sequential Monte Carlo (ASMC) for sampling from Gibbs measures at low temperature in non-convex landscapes, using geometric annealing and Langevin steps at each level. By exploiting local valley mixing and a carefully designed resampling scheme, the authors prove that ASMC estimators converge with time complexity that is polynomial in the inverse temperature and the desired accuracy, with a distinctive fourth-power scaling in and a squared dependence on in certain regimes. They establish two main model frameworks: a Local Mixing Model with explicit constants and a double-well energy scenario on the torus, the latter analyzed via spectral properties of the Langevin generator to obtain polynomial bounds under nondegeneracy and mass-ratio assumptions. The paper also provides numerical experiments illustrating the level-wise mass balancing and error behavior, and surveys related tempering and AIS/SMC methods to position ASMC as a dimension-independent, structure-agnostic approach to multimodal sampling. Overall, the results suggest a practically efficient route to low-temperature sampling in multimodal settings where global mixing is prohibitively slow.

Abstract

We study a sequential Monte Carlo algorithm to sample from the Gibbs measure with a non-convex energy function at a low temperature. We use the practical and popular geometric annealing schedule, and use a Langevin diffusion at each temperature level. The Langevin diffusion only needs to run for a time that is long enough to ensure local mixing within energy valleys, which is much shorter than the time required for global mixing. Our main result shows convergence of Monte Carlo estimators with time complexity that, approximately, scales like the fourth power of the inverse temperature, and the square of the inverse allowed error. We also study this algorithm in an illustrative model scenario where more explicit estimates can be given.

Paper Structure

This paper contains 42 sections, 32 theorems, 315 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1.1

Suppose $U \colon \mathbb{T}^d \to \mathbb{R}$ is a non-degenerate double-well function with wells of equal depth (but not necessarily the same shape). For $\varepsilon > 0$ let $Y_{\varepsilon, \cdot}$ be a solution to the overdamped Langevin equation where $W$ is a standard $d$-dimensional Brownian motion on the torus. There exists constants $C_N, C_T$, depending on $U$ and $d$, such that the f

Figures (3)

  • Figure 1: Contour plot of the anisotropic Gaussian mixture in $\mathbb{R}^2$, defined in \ref{['e:eqGaussian']}, and used in experiments for Figure \ref{['fig:stevN']}.
  • Figure 2: Left: A Monte Carlo integral computed using ASMC, LMC, and quadrature in 2D. Shaded regions indicate the 25%-75% quintile range over 100 independent Monte Carlo runs. Right: A Log-log plot of the mean error and standard deviation using ASMC as the number of particles varies.
  • Figure 3: Mean error of an integral in dimension $d=20$ computed using ASMC as $M, T$ vary, while holding $MT$ constant. Shaded regions indicate the 25%-75% quintile range. Left: A plot of the Monte Carlo integral vs the number of iterations for a few values of $M$. Right: A plot of the mean error vs $\log M$.

Theorems & Definitions (64)

  • Theorem 1.1
  • Remark 2.1
  • Theorem 2.2
  • Remark 2.3
  • Remark 2.4
  • Remark 2.5
  • Remark 2.6
  • Theorem 2.7
  • Proposition 3.1
  • Lemma 3.2
  • ...and 54 more