Shifted Composition III: Local Error Framework for KL Divergence
Jason M. Altschuler, Sinho Chewi
TL;DR
Shifted Composition III develops a discrete-time, KL-focused local-error framework that uses an auxiliary, shifted process to bound KL divergences between two stochastic processes driven by different kernels. By combining local (weak/strong) error analysis with a shifted Girsanov perspective, the paper provides KL guarantees for Langevin-based sampling methods across SLC, WLC, and LSI regimes, and delivers the first KL bounds for randomized midpoint discretization. The framework yields sharp rates, including the optimal $\tilde O(\sqrt{d}/\varepsilon)$ bound in SLC and LSI settings, and extends KL control to settings where Wasserstein-based analyses fail or are suboptimal. These results enable principled analysis and design of sampling algorithms in non-strongly-convex or non-Wasserstein contexts, with practical implications for high-dimensional Bayesian computation and non-Gaussian target distributions.
Abstract
Coupling arguments are a central tool for bounding the deviation between two stochastic processes, but traditionally have been limited to Wasserstein metrics. In this paper, we apply the shifted composition rule--an information-theoretic principle introduced in our earlier work--in order to adapt coupling arguments to the Kullback-Leibler (KL) divergence. Our framework combine the strengths of two previously disparate approaches: local error analysis and Girsanov's theorem. Akin to the former, it yields tight bounds by incorporating the so-called weak error, and is user-friendly in that it only requires easily verified local assumptions; and akin to the latter, it yields KL divergence guarantees and applies beyond Wasserstein contractivity. We apply this framework to the problem of sampling from a target distribution $π$. Here, the two stochastic processes are the Langevin diffusion and an algorithmic discretization thereof. Our framework provides a unified analysis when $π$ is assumed to be strongly log-concave (SLC), weakly log-concave (WLC), or to satisfy a log-Sobolev inequality (LSI). Among other results, this yields KL guarantees for the randomized midpoint discretization of the Langevin diffusion. Notably, our result: (1) yields the optimal $\tilde O(\sqrt d/ε)$ rate in the SLC and LSI settings; (2) is the first result to hold beyond the 2-Wasserstein metric in the SLC setting; and (3) is the first result to hold in \emph{any} metric in the WLC and LSI settings.
