A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics

Lei Li; Yuliang Wang

A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics

Lei Li, Yuliang Wang

TL;DR

This paper delivers a sharp uniform-in-time error analysis for Stochastic Gradient Langevin Dynamics (SGLD). By studying the evolution of probability densities via Fokker-Planck equations and employing Bayes’ rule and Girsanov transforms to handle random batches, the authors prove a second-order KL error bound between SGLD and the overdamped Langevin diffusion, uniform in time, for small stepsizes and extend it to varying step sizes. As a corollary, they obtain an O(eta) bound on the distance between the invariant measures of SGLD and Langevin diffusion in Wasserstein or total variation distances, representing a significant improvement over prior O(sqrt(eta)) results. The analysis relies on mild assumptions, including Lipschitz and confinement properties of the drift, a warm-start condition, and a Log-Sobolev inequality for the target, and yields explicit dependence on dimension and inverse temperature. These results enhance theoretical understanding of SGLD as a sampling method, with potential applications to related random-batch and diffusion-based algorithms.

Abstract

We establish a sharp uniform-in-time error estimate for the Stochastic Gradient Langevin Dynamics (SGLD), which is a widely-used sampling algorithm. Under mild assumptions, we obtain a uniform-in-time $O(η^2)$ bound for the KL-divergence between the SGLD iteration and the Langevin diffusion, where $η$ is the step size (or learning rate). Our analysis is also valid for varying step sizes. Consequently, we are able to derive an $O(η)$ bound for the distance between the invariant measures of the SGLD iteration and the Langevin diffusion, in terms of Wasserstein or total variation distances. Our result can be viewed as a significant improvement compared with existing analysis for SGLD in related literature.

A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics

TL;DR

Abstract

bound for the KL-divergence between the SGLD iteration and the Langevin diffusion, where

is the step size (or learning rate). Our analysis is also valid for varying step sizes. Consequently, we are able to derive an

bound for the distance between the invariant measures of the SGLD iteration and the Langevin diffusion, in terms of Wasserstein or total variation distances. Our result can be viewed as a significant improvement compared with existing analysis for SGLD in related literature.

Paper Structure (23 sections, 19 theorems, 173 equations)

This paper contains 23 sections, 19 theorems, 173 equations.

Introduction
Preliminaries
(Overdamped) Langevin Diffusion
Unadjusted Langevin Algorithm (ULA)
Main results
Assumptions
Some auxiliary results
Propagation of Log-Sobolev inequality
Moment control
Estimate of the Fisher information
Main theorems: sharp uniform-in-time error analysis for SGLD
Delayed proof for the local estimation
Methods of analysis: an overview
Proof of Proposition \ref{['local_estimate']}
Discussion
...and 8 more sections

Key Result

Theorem 1

Consider the probability density functions $\rho_t$, $\bar{\rho}_t$ for $X_t$, $\bar{X}_t$ defined in eq:overdampedlangevin, sgld_continuous with constant time step $\eta$. Suppose Assumptions ass:b, ass:pi hold. Then for small $\eta$, where $C$ is a positive time-independent constant.

Theorems & Definitions (33)

Theorem
Lemma 2.1
Proposition 3.1
Lemma 3.1: Holley-Stroock perturbation
proof : Proof of Proposition \ref{['LSIpropagation']}:
Proposition 3.2
Lemma 3.2
Lemma 3.3
Proposition 3.3
Theorem 3.1
...and 23 more

A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics

TL;DR

Abstract

A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (33)