Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

Aleksei Ustimenko; Aleksandr Beznosikov

Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

Aleksei Ustimenko, Aleksandr Beznosikov

TL;DR

The authors address diffusion approximations for a very general Ito chain that encompasses non-Gaussian, state-dependent noise and inexact drift and diffusion terms, unifying analysis across sampling, optimization, and boosting. They develop a multi-step approach—window coupling, interpolation, covariance-corrected interpolation, and entropy-based diffusion comparisons via Girsanov theory—to bound the Wasserstein-2 distance between the discrete chain and its diffusion limit. The resulting rates, expressed as $\mathcal{W}_2(\mathcal{L}(X_{k}),\mathcal{L}(Z_{k\eta}))=\mathcal{O}\big((1+(k\eta)^{1/2})e^{\mathcal{O}(k\eta)}\eta^{\theta}+ (k\eta)^{1/4}e^{\mathcal{O}(k\eta)}\eta^{\theta/2+\gamma/4}\big)$ with $\theta=\min\{\alpha, ((\gamma+1)(1+\chi_0)+(\gamma+\beta)(1-\chi_0))/4\}$, cover a broad range of settings, including SGD with Gaussian or non-Gaussian noise where $\theta$ evaluates to known rates (e.g., $\theta=1$ for certain SGD/SGLD cases). This work advances diffusion-approximation theory beyond dissipative/convex regimes and provides practical guarantees for sampling and optimization algorithms operating under general, potentially non-Gaussian noise structures.

Abstract

In this work, we consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Stochastic Differential Equation. The chain we study is a unified framework for theoretical analysis. It comes with almost arbitrary isotropic and state-dependent noise instead of normal and state-independent one as in most related papers. Moreover, in our chain the drift and diffusion coefficient can be inexact in order to cover wide range of applications as Stochastic Gradient Langevin Dynamics, sampling, Stochastic Gradient Descent or Stochastic Gradient Boosting. We prove the bound in $W_{2}$-distance between the laws of our Ito chain and corresponding differential equation. These results improve or cover most of the known estimates. And for some particular cases, our analysis is the first.

Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

TL;DR

with

, cover a broad range of settings, including SGD with Gaussian or non-Gaussian noise where

evaluates to known rates (e.g.,

for certain SGD/SGLD cases). This work advances diffusion-approximation theory beyond dissipative/convex regimes and provides practical guarantees for sampling and optimization algorithms operating under general, potentially non-Gaussian noise structures.

Abstract

-distance between the laws of our Ito chain and corresponding differential equation. These results improve or cover most of the known estimates. And for some particular cases, our analysis is the first.

Paper Structure (18 sections, 21 theorems, 121 equations, 3 tables)

This paper contains 18 sections, 21 theorems, 121 equations, 3 tables.

Introduction
Our Contribution and Related Works
Problem Setup and Assumptions
Main Results
Chain Approximation by Window Coupling
Naive Interpolation of the Approximation
Covariance Corrected Interpolation
Entropy Bound for Diffusion Approximation
Exponential Integrability
Final Result
Conclusion
Preliminaries
Missing Proofs
Useful Lemmas
Proofs of Lemmas \ref{['lemma:A2_main']}-\ref{['lemma:A4-A3']}
...and 3 more sections

Key Result

Lemma 1

Let Assumption as:key holds. If $L \ge 1 + M_{0}+M + 2M_{0}^2 + M^2 + M^2 M_{\epsilon}^2$ and $S\eta \leq 1$, then for any $t > 0$: where $R^2(t) \ge \max\left\{ 1;\max_{{\eta} k \le t} \mathbb{E}\| X_{k} \|^2\right\}$.

Theorems & Definitions (34)

Lemma 1
Corollary 1
Lemma 2
Corollary 2
Lemma 3
Corollary 3
Theorem 1: One-time Girsanov formula for mixed Ito/adapted coefficients
Corollary 4
Lemma 4
Theorem 2
...and 24 more

Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

TL;DR

Abstract

Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (34)