Table of Contents
Fetching ...

Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler

Holden Lee, Matheau Santana-Gijzen

TL;DR

Re-ALPS tackles multimodal sampling by leveraging warm starts without relying on Hessian Gaussian approximations. It introduces a tilted, mixture-based temping scheme with dynamically estimated modal and level weights, plus a continuous-time analysis that supports non-asymptotic polynomial-time TV bounds. The core contributions are a Markov-decomposition-based convergence analysis, inductive control of partition-function estimation, and practical weight-balancing via Monte Carlo estimates. Empirically, Re-ALPS demonstrates improved cross-mode mixing over the original ALPS on a challenging three-mode heavy-tailed example, highlighting the method's potential when Hessian information is unreliable.

Abstract

Sampling from multimodal distributions is a central challenge in Bayesian inference and machine learning. In light of hardness results for sampling -- classical MCMC methods, even with tempering, can suffer from exponential mixing times -- a natural question is how to leverage additional information, such as a warm start point for each mode, to enable faster mixing across modes. To address this, we introduce Reweighted ALPS (Re-ALPS), a modified version of the Annealed Leap-Point Sampler (ALPS) that dispenses with the Gaussian approximation assumption. We prove the first polynomial-time bound that works in a general setting, under a natural assumption that each component contains significant mass relative to the others when tilted towards the corresponding warm start point. Similarly to ALPS, we define distributions tilted towards a mixture centered at the warm start points, and at the coldest level, use teleportation between warm start points to enable efficient mixing across modes. In contrast to ALPS, our method does not require Hessian information at the modes, but instead estimates component partition functions via Monte Carlo. This additional estimation step is crucial in allowing the algorithm to handle target distributions with more complex geometries besides approximate Gaussian. For the proof, we show convergence results for Markov processes when only part of the stationary distribution is well-mixing and estimation for partition functions for individual components of a mixture. We numerically evaluate our algorithm's mixing performance compared to ALPS on a mixture of heavy-tailed distributions.

Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler

TL;DR

Re-ALPS tackles multimodal sampling by leveraging warm starts without relying on Hessian Gaussian approximations. It introduces a tilted, mixture-based temping scheme with dynamically estimated modal and level weights, plus a continuous-time analysis that supports non-asymptotic polynomial-time TV bounds. The core contributions are a Markov-decomposition-based convergence analysis, inductive control of partition-function estimation, and practical weight-balancing via Monte Carlo estimates. Empirically, Re-ALPS demonstrates improved cross-mode mixing over the original ALPS on a challenging three-mode heavy-tailed example, highlighting the method's potential when Hessian information is unreliable.

Abstract

Sampling from multimodal distributions is a central challenge in Bayesian inference and machine learning. In light of hardness results for sampling -- classical MCMC methods, even with tempering, can suffer from exponential mixing times -- a natural question is how to leverage additional information, such as a warm start point for each mode, to enable faster mixing across modes. To address this, we introduce Reweighted ALPS (Re-ALPS), a modified version of the Annealed Leap-Point Sampler (ALPS) that dispenses with the Gaussian approximation assumption. We prove the first polynomial-time bound that works in a general setting, under a natural assumption that each component contains significant mass relative to the others when tilted towards the corresponding warm start point. Similarly to ALPS, we define distributions tilted towards a mixture centered at the warm start points, and at the coldest level, use teleportation between warm start points to enable efficient mixing across modes. In contrast to ALPS, our method does not require Hessian information at the modes, but instead estimates component partition functions via Monte Carlo. This additional estimation step is crucial in allowing the algorithm to handle target distributions with more complex geometries besides approximate Gaussian. For the proof, we show convergence results for Markov processes when only part of the stationary distribution is well-mixing and estimation for partition functions for individual components of a mixture. We numerically evaluate our algorithm's mixing performance compared to ALPS on a mixture of heavy-tailed distributions.

Paper Structure

This paper contains 36 sections, 42 theorems, 200 equations, 4 figures, 1 table, 3 algorithms.

Key Result

Proposition 3.2

(Tempering by Gaussians) Let Assumptions Assumptions hold for $\pi(x) = \sum_{k=1}^{M}\alpha_k\pi_k(x)$ with $\alpha_k\pi_k(x) = e^{-f_k(x)}$ where $f_k(x)$ is $L$-smooth. In addition, assume that a log-Sobolev inequality holds with constant $C_{LS}$ for $\pi_{i,j,k} \propto\pi_j(x)\cdot q_i(x-x_k)$

Figures (4)

  • Figure 1: Unscaled modal weights ($Z_{ik}$). The different node sizes show the imbalance, creating bottlenecks that hinder sampling.
  • Figure 2: Scaled modal weights ($W_{ik} = w_{ik}Z_{ik}$). After re-weighting, all components are balanced, represented by the uniform node size.
  • Figure 3: An illustration of good (solid) and bad (dashed) mixture components. The good components are sharply peaked around the warm-start locations, while the bad pseudo-modes from cross-terms have negligible mass.
  • Figure 4: Samples were acquired from $\pi(x; \sigma_1 = .2, \sigma_2 = 20)$ with $d=5$.

Theorems & Definitions (85)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Example 2.1: Low acceptance for teleporting
  • Example 2.2: Bottlenecks with tempering
  • Proposition 3.2
  • Theorem 3.3
  • Definition 4.1
  • Definition 4.2
  • Lemma 4.3
  • ...and 75 more