A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics
Lei Li, Yuliang Wang
TL;DR
This paper delivers a sharp uniform-in-time error analysis for Stochastic Gradient Langevin Dynamics (SGLD). By studying the evolution of probability densities via Fokker-Planck equations and employing Bayes’ rule and Girsanov transforms to handle random batches, the authors prove a second-order KL error bound between SGLD and the overdamped Langevin diffusion, uniform in time, for small stepsizes and extend it to varying step sizes. As a corollary, they obtain an O(eta) bound on the distance between the invariant measures of SGLD and Langevin diffusion in Wasserstein or total variation distances, representing a significant improvement over prior O(sqrt(eta)) results. The analysis relies on mild assumptions, including Lipschitz and confinement properties of the drift, a warm-start condition, and a Log-Sobolev inequality for the target, and yields explicit dependence on dimension and inverse temperature. These results enhance theoretical understanding of SGLD as a sampling method, with potential applications to related random-batch and diffusion-based algorithms.
Abstract
We establish a sharp uniform-in-time error estimate for the Stochastic Gradient Langevin Dynamics (SGLD), which is a widely-used sampling algorithm. Under mild assumptions, we obtain a uniform-in-time $O(η^2)$ bound for the KL-divergence between the SGLD iteration and the Langevin diffusion, where $η$ is the step size (or learning rate). Our analysis is also valid for varying step sizes. Consequently, we are able to derive an $O(η)$ bound for the distance between the invariant measures of the SGLD iteration and the Langevin diffusion, in terms of Wasserstein or total variation distances. Our result can be viewed as a significant improvement compared with existing analysis for SGLD in related literature.
