Table of Contents
Fetching ...

A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics

Lei Li, Yuliang Wang

TL;DR

This paper delivers a sharp uniform-in-time error analysis for Stochastic Gradient Langevin Dynamics (SGLD). By studying the evolution of probability densities via Fokker-Planck equations and employing Bayes’ rule and Girsanov transforms to handle random batches, the authors prove a second-order KL error bound between SGLD and the overdamped Langevin diffusion, uniform in time, for small stepsizes and extend it to varying step sizes. As a corollary, they obtain an O(eta) bound on the distance between the invariant measures of SGLD and Langevin diffusion in Wasserstein or total variation distances, representing a significant improvement over prior O(sqrt(eta)) results. The analysis relies on mild assumptions, including Lipschitz and confinement properties of the drift, a warm-start condition, and a Log-Sobolev inequality for the target, and yields explicit dependence on dimension and inverse temperature. These results enhance theoretical understanding of SGLD as a sampling method, with potential applications to related random-batch and diffusion-based algorithms.

Abstract

We establish a sharp uniform-in-time error estimate for the Stochastic Gradient Langevin Dynamics (SGLD), which is a widely-used sampling algorithm. Under mild assumptions, we obtain a uniform-in-time $O(η^2)$ bound for the KL-divergence between the SGLD iteration and the Langevin diffusion, where $η$ is the step size (or learning rate). Our analysis is also valid for varying step sizes. Consequently, we are able to derive an $O(η)$ bound for the distance between the invariant measures of the SGLD iteration and the Langevin diffusion, in terms of Wasserstein or total variation distances. Our result can be viewed as a significant improvement compared with existing analysis for SGLD in related literature.

A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics

TL;DR

This paper delivers a sharp uniform-in-time error analysis for Stochastic Gradient Langevin Dynamics (SGLD). By studying the evolution of probability densities via Fokker-Planck equations and employing Bayes’ rule and Girsanov transforms to handle random batches, the authors prove a second-order KL error bound between SGLD and the overdamped Langevin diffusion, uniform in time, for small stepsizes and extend it to varying step sizes. As a corollary, they obtain an O(eta) bound on the distance between the invariant measures of SGLD and Langevin diffusion in Wasserstein or total variation distances, representing a significant improvement over prior O(sqrt(eta)) results. The analysis relies on mild assumptions, including Lipschitz and confinement properties of the drift, a warm-start condition, and a Log-Sobolev inequality for the target, and yields explicit dependence on dimension and inverse temperature. These results enhance theoretical understanding of SGLD as a sampling method, with potential applications to related random-batch and diffusion-based algorithms.

Abstract

We establish a sharp uniform-in-time error estimate for the Stochastic Gradient Langevin Dynamics (SGLD), which is a widely-used sampling algorithm. Under mild assumptions, we obtain a uniform-in-time bound for the KL-divergence between the SGLD iteration and the Langevin diffusion, where is the step size (or learning rate). Our analysis is also valid for varying step sizes. Consequently, we are able to derive an bound for the distance between the invariant measures of the SGLD iteration and the Langevin diffusion, in terms of Wasserstein or total variation distances. Our result can be viewed as a significant improvement compared with existing analysis for SGLD in related literature.
Paper Structure (23 sections, 19 theorems, 173 equations)

This paper contains 23 sections, 19 theorems, 173 equations.

Key Result

Theorem 1

Consider the probability density functions $\rho_t$, $\bar{\rho}_t$ for $X_t$, $\bar{X}_t$ defined in eq:overdampedlangevin, sgld_continuous with constant time step $\eta$. Suppose Assumptions ass:b, ass:pi hold. Then for small $\eta$, where $C$ is a positive time-independent constant.

Theorems & Definitions (33)

  • Theorem
  • Lemma 2.1
  • Proposition 3.1
  • Lemma 3.1: Holley-Stroock perturbation
  • proof : Proof of Proposition \ref{['LSIpropagation']}:
  • Proposition 3.2
  • Lemma 3.2
  • Lemma 3.3
  • Proposition 3.3
  • Theorem 3.1
  • ...and 23 more