Table of Contents
Fetching ...

Convergence of Langevin MCMC in KL-divergence

Xiang Cheng, Peter Bartlett

TL;DR

The paper proves nonasymptotic convergence of discretized Langevin MCMC in KL-divergence to a strongly log-concave target, under L-smoothness and m-strong convexity, with explicit iteration and step-size choices that yield KL accuracy in O(d/ε) scaled terms. It recasts Langevin diffusion as a gradient flow in the space of probability measures, enabling a unified KL-based analysis from which total variation and 2-Wasserstein convergence follow as corollaries. It also extends the results to the weakly convex case, providing comparable KL and TV guarantees without strong convexity, and supplies a comprehensive set of supplementary proofs and regularity lemmas. The approach highlights a conceptually clean framework where KL-dissipation drives convergence and connects sampling theory with optimal transport geometry. Practically, the results offer concrete guidelines for achieving precise KL accuracy in high-dimensional sampling tasks using discretized Langevin dynamics.

Abstract

Langevin diffusion is a commonly used tool for sampling from a given distribution. In this work, we establish that when the target density $p^*$ is such that $\log p^*$ is $L$ smooth and $m$ strongly convex, discrete Langevin diffusion produces a distribution $p$ with $KL(p||p^*)\leq ε$ in $\tilde{O}(\frac{d}ε)$ steps, where $d$ is the dimension of the sample space. We also study the convergence rate when the strong-convexity assumption is absent. By considering the Langevin diffusion as a gradient flow in the space of probability distributions, we obtain an elegant analysis that applies to the stronger property of convergence in KL-divergence and gives a conceptually simpler proof of the best-known convergence results in weaker metrics.

Convergence of Langevin MCMC in KL-divergence

TL;DR

The paper proves nonasymptotic convergence of discretized Langevin MCMC in KL-divergence to a strongly log-concave target, under L-smoothness and m-strong convexity, with explicit iteration and step-size choices that yield KL accuracy in O(d/ε) scaled terms. It recasts Langevin diffusion as a gradient flow in the space of probability measures, enabling a unified KL-based analysis from which total variation and 2-Wasserstein convergence follow as corollaries. It also extends the results to the weakly convex case, providing comparable KL and TV guarantees without strong convexity, and supplies a comprehensive set of supplementary proofs and regularity lemmas. The approach highlights a conceptually clean framework where KL-dissipation drives convergence and connects sampling theory with optimal transport geometry. Practically, the results offer concrete guidelines for achieving precise KL accuracy in high-dimensional sampling tasks using discretized Langevin dynamics.

Abstract

Langevin diffusion is a commonly used tool for sampling from a given distribution. In this work, we establish that when the target density is such that is smooth and strongly convex, discrete Langevin diffusion produces a distribution with in steps, where is the dimension of the sample space. We also study the convergence rate when the strong-convexity assumption is absent. By considering the Langevin diffusion as a gradient flow in the space of probability distributions, we obtain an elegant analysis that applies to the stronger property of convergence in KL-divergence and gives a conceptually simpler proof of the best-known convergence results in weaker metrics.

Paper Structure

This paper contains 16 sections, 19 theorems, 47 equations, 1 table.

Key Result

Lemma 1

For any ${\boldsymbol{\mu}}\in \mathscr{P}(\mathbb{R}^d)$, let $\frac{\delta F}{\delta {\boldsymbol{\mu}}}({\boldsymbol{\mu}}): \mathbb{R}^d \to \mathbb{R}$ be the first variation of $F$ at ${\boldsymbol{\mu}}$ defined as $\left(\frac{\delta F}{\delta {\boldsymbol{\mu}}}({\boldsymbol{\mu}})\right) ( . For any curve ${\boldsymbol{\mu}}_t : \mathbb{R}^+ \to \mathscr{P}(\mathbb{R}^d)$, and for any $v

Theorems & Definitions (22)

  • Remark 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Corollary 4
  • Remark 2
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Theorem 3
  • ...and 12 more