Sharp convergence rates for Langevin dynamics in the nonconvex setting
Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael I. Jordan
TL;DR
This work analyzes Langevin MCMC for sampling from p^*(x) ∝ e^{-U(x)} when U is globally L-smooth and m-strongly convex outside a ball of radius R, but potentially nonconvex inside. Using a coupling-based framework with a carefully constructed Lyapunov function, the authors derive sharp nonasymptotic convergence rates in 1-Wasserstein distance for both overdamped and underdamped Langevin MCMC: overdamped requires ~ Ö(e^{cLR^2} d / ε^2) iterations, while underdamped achieves ~ Ö(e^{cLR^2} sqrt{d} / ε). The dependence on LR^2 captures the nonconvexity scale, while the polynomial dimension dependence showcases tractability even in high dimensions. The analysis hinges on a combination of synchronous and reflection couplings, a smoothed distance function f, and discretization-error controls, culminating in a contraction of a Lyapunov function that translates into Wasserstein convergence. These results provide a principled understanding of how localization of nonconvexity affects mixing and offer practical guidance for tuning step sizes and iterations in nonconvex sampling tasks.
Abstract
We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball. We study both overdamped and underdamped Langevin MCMC and establish upper bounds on the number of steps required to obtain a sample from a distribution that is within $ε$ of $p^*$ in $1$-Wasserstein distance. For the first-order method (overdamped Langevin MCMC), the iteration complexity is $\tilde{\mathcal{O}}\left(e^{cLR^2}d/ε^2\right)$, where $d$ is the dimension of the underlying space. For the second-order method (underdamped Langevin MCMC), the iteration complexity is $\tilde{\mathcal{O}}\left(e^{cLR^2}\sqrt{d}/ε\right)$ for an explicit positive constant $c$. Surprisingly, the iteration complexity for both these algorithms is only polynomial in the dimension $d$ and the target accuracy $ε$. It is exponential, however, in the problem parameter $LR^2$, which is a measure of non-log-concavity of the target distribution.
