Table of Contents
Fetching ...

Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics

Omar Chehab, Anna Korba, Austin Stromme, Adrien Vacher

TL;DR

This work analyzes geometric tempering for Langevin-based sampling, providing the first KL-based convergence bounds for tempered Langevin dynamics along a general tempering path and revealing that the method can both improve and degrade convergence depending on problem conditioning. It derives continuous- and discrete-time upper bounds that depend on the log-Sobolev constants of intermediate tempered distributions and characterizes an optimal tempering schedule in the strongly log-concave regime. Crucially, the paper shows that geometric tempering can exponentially worsen functional inequalities and, in certain multi- or even unimodal settings, lead to exponential slowdowns in convergence, challenging the assumption that tempering always helps. The results underscore the need for careful choice of tempering paths and schedules and motivate exploring alternative moving-target strategies beyond the geometric path. Overall, the findings provide a nuanced, theory-backed view of when geometric tempering is beneficial and when it may be detrimental for Langevin sampling.

Abstract

Geometric tempering is a popular approach to sampling from challenging multi-modal probability distributions by instead sampling from a sequence of distributions which interpolate, using the geometric mean, between an easier proposal distribution and the target distribution. In this paper, we theoretically investigate the soundness of this approach when the sampling algorithm is Langevin dynamics, proving both upper and lower bounds. Our upper bounds are the first analysis in the literature under functional inequalities. They assert the convergence of tempered Langevin in continuous and discrete-time, and their minimization leads to closed-form optimal tempering schedules for some pairs of proposal and target distributions. Our lower bounds demonstrate a simple case where the geometric tempering takes exponential time, and further reveal that the geometric tempering can suffer from poor functional inequalities and slow convergence, even when the target distribution is well-conditioned. Overall, our results indicate that geometric tempering may not help, and can even be harmful for convergence.

Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics

TL;DR

This work analyzes geometric tempering for Langevin-based sampling, providing the first KL-based convergence bounds for tempered Langevin dynamics along a general tempering path and revealing that the method can both improve and degrade convergence depending on problem conditioning. It derives continuous- and discrete-time upper bounds that depend on the log-Sobolev constants of intermediate tempered distributions and characterizes an optimal tempering schedule in the strongly log-concave regime. Crucially, the paper shows that geometric tempering can exponentially worsen functional inequalities and, in certain multi- or even unimodal settings, lead to exponential slowdowns in convergence, challenging the assumption that tempering always helps. The results underscore the need for careful choice of tempering paths and schedules and motivate exploring alternative moving-target strategies beyond the geometric path. Overall, the findings provide a nuanced, theory-backed view of when geometric tempering is beneficial and when it may be detrimental for Langevin sampling.

Abstract

Geometric tempering is a popular approach to sampling from challenging multi-modal probability distributions by instead sampling from a sequence of distributions which interpolate, using the geometric mean, between an easier proposal distribution and the target distribution. In this paper, we theoretically investigate the soundness of this approach when the sampling algorithm is Langevin dynamics, proving both upper and lower bounds. Our upper bounds are the first analysis in the literature under functional inequalities. They assert the convergence of tempered Langevin in continuous and discrete-time, and their minimization leads to closed-form optimal tempering schedules for some pairs of proposal and target distributions. Our lower bounds demonstrate a simple case where the geometric tempering takes exponential time, and further reveal that the geometric tempering can suffer from poor functional inequalities and slow convergence, even when the target distribution is well-conditioned. Overall, our results indicate that geometric tempering may not help, and can even be harmful for convergence.

Paper Structure

This paper contains 55 sections, 25 theorems, 213 equations, 4 figures.

Key Result

Theorem 1

Suppose Assumption assump:reg and assump:lipschitz_dissipative hold, and let $(\alpha_t)_{t \geq 0}$ be the inverse log-Sobolev constants as in Eq. eqn:alpha_t, assumed to be integrable. Let $p_t$ be the law of Eq. eq:continuous_time_dynamic with initialization $p_0$ and denote by $\dot{\lambda}_t$ where $A = 2(L_\pi + L_\nu) ( \frac{2(d + b_\nu + b_\pi)}{a_\nu \land a_\pi} + \mathbb{E}_{p_0}[ \

Figures (4)

  • Figure 1: Bi-modal intermediate distribution at $\lambda = 0.45$ with $\nu, \pi$ as in Eq. \ref{['eqn:unimodal_small']} for $m = 10$.
  • Figure 2: Value of the upper-bound $G$ uling the optimal tempering, linear tempering, and standard Langevin.
  • Figure 3: Numerical validation of the rate of convergence predicted by Proposition \ref{['prop:linear_tempering']}. Dashed lines are our prediction from Proposition \ref{['prop:linear_tempering']} and full lines are from simulations of the process using 10 000 particles. The proposal and target are two-dimensional Gaussians, with zero mean and covariance matrices that have a constant diagonal, equal to one for the proposal and 10 for the target. As expected, the predicted rate from Proposition \ref{['prop:linear_tempering']} (in dashed red) matches the approximated upper bound from Theorem \ref{['th:annealed_langevin_discrete_time']} (in dashed yellow) as well as particle-based simulations (full lines), for large values of time.
  • Figure 4: Geometric path from a Gaussian to a Gaussian mixture. We observe that that this path displaces mass "horizontally" to the nearest modes (left columns), and then "vertically" to the remaining modes (right columns). Intuitively, this second part is problematic for a Langevin sampler.

Theorems & Definitions (46)

  • Theorem 1: Continuous time
  • Remark 2: Recovering standard upper bound for Langevin dynamics without tempering
  • Theorem 3: Discrete time
  • Theorem 4
  • Corollary 5
  • Proposition 6
  • Proposition 7
  • Theorem 8
  • Theorem 9
  • Lemma 10: Discrepancy between the laws of the sampling process and tempering path
  • ...and 36 more