Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler
Holden Lee, Matheau Santana-Gijzen
TL;DR
Re-ALPS tackles multimodal sampling by leveraging warm starts without relying on Hessian Gaussian approximations. It introduces a tilted, mixture-based temping scheme with dynamically estimated modal and level weights, plus a continuous-time analysis that supports non-asymptotic polynomial-time TV bounds. The core contributions are a Markov-decomposition-based convergence analysis, inductive control of partition-function estimation, and practical weight-balancing via Monte Carlo estimates. Empirically, Re-ALPS demonstrates improved cross-mode mixing over the original ALPS on a challenging three-mode heavy-tailed example, highlighting the method's potential when Hessian information is unreliable.
Abstract
Sampling from multimodal distributions is a central challenge in Bayesian inference and machine learning. In light of hardness results for sampling -- classical MCMC methods, even with tempering, can suffer from exponential mixing times -- a natural question is how to leverage additional information, such as a warm start point for each mode, to enable faster mixing across modes. To address this, we introduce Reweighted ALPS (Re-ALPS), a modified version of the Annealed Leap-Point Sampler (ALPS) that dispenses with the Gaussian approximation assumption. We prove the first polynomial-time bound that works in a general setting, under a natural assumption that each component contains significant mass relative to the others when tilted towards the corresponding warm start point. Similarly to ALPS, we define distributions tilted towards a mixture centered at the warm start points, and at the coldest level, use teleportation between warm start points to enable efficient mixing across modes. In contrast to ALPS, our method does not require Hessian information at the modes, but instead estimates component partition functions via Monte Carlo. This additional estimation step is crucial in allowing the algorithm to handle target distributions with more complex geometries besides approximate Gaussian. For the proof, we show convergence results for Markov processes when only part of the stationary distribution is well-mixing and estimation for partition functions for individual components of a mixture. We numerically evaluate our algorithm's mixing performance compared to ALPS on a mixture of heavy-tailed distributions.
