Table of Contents
Fetching ...

Concentration of the Langevin Algorithm's Stationary Distribution

Jason M. Altschuler, Kunal Talwar

TL;DR

Key to the analysis is the use of a rotation-invariant moment generating function (aka Bessel function) to study the stationary dynamics of the Langevin Algorithm, and it is shown that for any nontrivial stepsize $\eta>0$, $\pi_{\eta}$ is sub-exponential when the potential is convex.

Abstract

A canonical algorithm for log-concave sampling is the Langevin Algorithm, aka the Langevin Diffusion run with some discretization stepsize $η> 0$. This discretization leads the Langevin Algorithm to have a stationary distribution $π_η$ which differs from the stationary distribution $π$ of the Langevin Diffusion, and it is an important challenge to understand whether the well-known properties of $π$ extend to $π_η$. In particular, while concentration properties such as isoperimetry and rapidly decaying tails are classically known for $π$, the analogous properties for $π_η$ are open questions with algorithmic implications. This note provides a first step in this direction by establishing concentration results for $π_η$ that mirror classical results for $π$. Specifically, we show that for any nontrivial stepsize $η> 0$, $π_η$ is sub-exponential (respectively, sub-Gaussian) when the potential is convex (respectively, strongly convex). Moreover, the concentration bounds we show are essentially tight. We also show that these concentration bounds extend to all iterates along the trajectory of the Langevin Algorithm, and to inexact implementations which use sub-Gaussian estimates of the gradient. Key to our analysis is the use of a rotation-invariant moment generating function (aka Bessel function) to study the stationary dynamics of the Langevin Algorithm. This technique may be of independent interest because it enables directly analyzing the discrete-time stationary distribution $π_η$ without going through the continuous-time stationary distribution $π$ as an intermediary.

Concentration of the Langevin Algorithm's Stationary Distribution

TL;DR

Key to the analysis is the use of a rotation-invariant moment generating function (aka Bessel function) to study the stationary dynamics of the Langevin Algorithm, and it is shown that for any nontrivial stepsize , is sub-exponential when the potential is convex.

Abstract

A canonical algorithm for log-concave sampling is the Langevin Algorithm, aka the Langevin Diffusion run with some discretization stepsize . This discretization leads the Langevin Algorithm to have a stationary distribution which differs from the stationary distribution of the Langevin Diffusion, and it is an important challenge to understand whether the well-known properties of extend to . In particular, while concentration properties such as isoperimetry and rapidly decaying tails are classically known for , the analogous properties for are open questions with algorithmic implications. This note provides a first step in this direction by establishing concentration results for that mirror classical results for . Specifically, we show that for any nontrivial stepsize , is sub-exponential (respectively, sub-Gaussian) when the potential is convex (respectively, strongly convex). Moreover, the concentration bounds we show are essentially tight. We also show that these concentration bounds extend to all iterates along the trajectory of the Langevin Algorithm, and to inexact implementations which use sub-Gaussian estimates of the gradient. Key to our analysis is the use of a rotation-invariant moment generating function (aka Bessel function) to study the stationary dynamics of the Langevin Algorithm. This technique may be of independent interest because it enables directly analyzing the discrete-time stationary distribution without going through the continuous-time stationary distribution as an intermediary.
Paper Structure (15 sections, 12 theorems, 46 equations)

This paper contains 15 sections, 12 theorems, 46 equations.

Key Result

Lemma 3.2

For any dimension $d \geqslant 2$ and argument $z > 0$, where $\alpha := (d-2)/2$. (In dimension $d=1$, we simply have $\phi_d(z) = \cosh(z)$.)

Theorems & Definitions (25)

  • Definition 3.1: Lyapunov function
  • Lemma 3.2: Explicit formula for Lyapunov function
  • proof
  • Lemma 3.3: Behavior of $\Phi$ under Gaussian convolution
  • proof
  • Lemma 3.4: Properties of rotation-invariant MGF
  • proof
  • Theorem 4.1: Sub-Gaussianity of $\pi_{\eta}$ for strongly convex potentials
  • Lemma 4.2: Contractivity of gradient descent step
  • proof : Proof of Theorem \ref{['thm:sc']}
  • ...and 15 more