Table of Contents
Fetching ...

Sharp convergence rates for Langevin dynamics in the nonconvex setting

Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael I. Jordan

TL;DR

This work analyzes Langevin MCMC for sampling from p^*(x) ∝ e^{-U(x)} when U is globally L-smooth and m-strongly convex outside a ball of radius R, but potentially nonconvex inside. Using a coupling-based framework with a carefully constructed Lyapunov function, the authors derive sharp nonasymptotic convergence rates in 1-Wasserstein distance for both overdamped and underdamped Langevin MCMC: overdamped requires ~ Ö(e^{cLR^2} d / ε^2) iterations, while underdamped achieves ~ Ö(e^{cLR^2} sqrt{d} / ε). The dependence on LR^2 captures the nonconvexity scale, while the polynomial dimension dependence showcases tractability even in high dimensions. The analysis hinges on a combination of synchronous and reflection couplings, a smoothed distance function f, and discretization-error controls, culminating in a contraction of a Lyapunov function that translates into Wasserstein convergence. These results provide a principled understanding of how localization of nonconvexity affects mixing and offer practical guidance for tuning step sizes and iterations in nonconvex sampling tasks.

Abstract

We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball. We study both overdamped and underdamped Langevin MCMC and establish upper bounds on the number of steps required to obtain a sample from a distribution that is within $ε$ of $p^*$ in $1$-Wasserstein distance. For the first-order method (overdamped Langevin MCMC), the iteration complexity is $\tilde{\mathcal{O}}\left(e^{cLR^2}d/ε^2\right)$, where $d$ is the dimension of the underlying space. For the second-order method (underdamped Langevin MCMC), the iteration complexity is $\tilde{\mathcal{O}}\left(e^{cLR^2}\sqrt{d}/ε\right)$ for an explicit positive constant $c$. Surprisingly, the iteration complexity for both these algorithms is only polynomial in the dimension $d$ and the target accuracy $ε$. It is exponential, however, in the problem parameter $LR^2$, which is a measure of non-log-concavity of the target distribution.

Sharp convergence rates for Langevin dynamics in the nonconvex setting

TL;DR

This work analyzes Langevin MCMC for sampling from p^*(x) ∝ e^{-U(x)} when U is globally L-smooth and m-strongly convex outside a ball of radius R, but potentially nonconvex inside. Using a coupling-based framework with a carefully constructed Lyapunov function, the authors derive sharp nonasymptotic convergence rates in 1-Wasserstein distance for both overdamped and underdamped Langevin MCMC: overdamped requires ~ Ö(e^{cLR^2} d / ε^2) iterations, while underdamped achieves ~ Ö(e^{cLR^2} sqrt{d} / ε). The dependence on LR^2 captures the nonconvexity scale, while the polynomial dimension dependence showcases tractability even in high dimensions. The analysis hinges on a combination of synchronous and reflection couplings, a smoothed distance function f, and discretization-error controls, culminating in a contraction of a Lyapunov function that translates into Wasserstein convergence. These results provide a principled understanding of how localization of nonconvexity affects mixing and offer practical guidance for tuning step sizes and iterations in nonconvex sampling tasks.

Abstract

We study the problem of sampling from a distribution , where the function is -smooth everywhere and -strongly convex outside a ball of radius , but potentially nonconvex inside this ball. We study both overdamped and underdamped Langevin MCMC and establish upper bounds on the number of steps required to obtain a sample from a distribution that is within of in -Wasserstein distance. For the first-order method (overdamped Langevin MCMC), the iteration complexity is , where is the dimension of the underlying space. For the second-order method (underdamped Langevin MCMC), the iteration complexity is for an explicit positive constant . Surprisingly, the iteration complexity for both these algorithms is only polynomial in the dimension and the target accuracy . It is exponential, however, in the problem parameter , which is a measure of non-log-concavity of the target distribution.

Paper Structure

This paper contains 30 sections, 40 theorems, 263 equations, 1 figure, 2 algorithms.

Key Result

Theorem 1

Given a potential $U$ that is $L$-smooth everywhere and strongly-convex outside a ball of radius $R$, we can output a sample from a distribution which is $\varepsilon$-close to $p^*(x)\propto\exp\left(-U(x)\right)$ in $W_1$ distance by running $\widetilde{\mathcal{O}}\left(e^{cLR^2}d / \varepsilon^2

Figures (1)

  • Figure :

Theorems & Definitions (40)

  • Theorem 1: informal
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 30 more