Table of Contents
Fetching ...

Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic Localization

Joe Benton, Valentin De Bortoli, Arnaud Doucet, George Deligiannidis

TL;DR

This work closes the gap on the theoretical understanding of diffusion-model convergence by proving bounds that scale linearly with data dimension (up to logarithmic factors) under only finite second moments, removing the need for strong smoothness assumptions.The authors combine Girsanov-based KL analysis with a refined discretization-error treatment inspired by stochastic localization, introducing a key lemma to control covariance terms and enable tight path-measure comparisons.Under an appropriate score-estimation error bound and early stopping, they show the diffusion process requires at most $ ilde{O}igl( rac{d \, ext{log}^2(1/oldsymbol{ au})}{oldsymbol{ ext{epsilon}}^2}igr)$ steps to approximate an arbitrary distribution to $oldsymbol{KL}$ error $oldsymbol{ ext{epsilon}}^2}$, addressing the previously observed quadratic-in-$d$ gap.The results imply that diffusion-based sampling can scale more favorably with dimension in theory, matching intuition from stochastic localization and offering practical guidance for high-dimensional generative modeling.

Abstract

Denoising diffusions are a powerful method to generate approximate samples from high-dimensional data distributions. Recent results provide polynomial bounds on their convergence rate, assuming $L^2$-accurate scores. Until now, the tightest bounds were either superlinear in the data dimension or required strong smoothness assumptions. We provide the first convergence bounds which are linear in the data dimension (up to logarithmic factors) assuming only finite second moments of the data distribution. We show that diffusion models require at most $\tilde O(\frac{d \log^2(1/δ)}{\varepsilon^2})$ steps to approximate an arbitrary distribution on $\mathbb{R}^d$ corrupted with Gaussian noise of variance $δ$ to within $\varepsilon^2$ in KL divergence. Our proof extends the Girsanov-based methods of previous works. We introduce a refined treatment of the error from discretizing the reverse SDE inspired by stochastic localization.

Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic Localization

TL;DR

This work closes the gap on the theoretical understanding of diffusion-model convergence by proving bounds that scale linearly with data dimension (up to logarithmic factors) under only finite second moments, removing the need for strong smoothness assumptions.The authors combine Girsanov-based KL analysis with a refined discretization-error treatment inspired by stochastic localization, introducing a key lemma to control covariance terms and enable tight path-measure comparisons.Under an appropriate score-estimation error bound and early stopping, they show the diffusion process requires at most $ ilde{O}igl( rac{d \, ext{log}^2(1/oldsymbol{ au})}{oldsymbol{ ext{epsilon}}^2}igr)$ steps to approximate an arbitrary distribution to $oldsymbol{KL}$ error $oldsymbol{ ext{epsilon}}^2}$, addressing the previously observed quadratic-in-$d$ gap.The results imply that diffusion-based sampling can scale more favorably with dimension in theory, matching intuition from stochastic localization and offering practical guidance for high-dimensional generative modeling.

Abstract

Denoising diffusions are a powerful method to generate approximate samples from high-dimensional data distributions. Recent results provide polynomial bounds on their convergence rate, assuming -accurate scores. Until now, the tightest bounds were either superlinear in the data dimension or required strong smoothness assumptions. We provide the first convergence bounds which are linear in the data dimension (up to logarithmic factors) assuming only finite second moments of the data distribution. We show that diffusion models require at most steps to approximate an arbitrary distribution on corrupted with Gaussian noise of variance to within in KL divergence. Our proof extends the Girsanov-based methods of previous works. We introduce a refined treatment of the error from discretizing the reverse SDE inspired by stochastic localization.
Paper Structure (21 sections, 17 theorems, 79 equations, 1 figure, 1 table)

This paper contains 21 sections, 17 theorems, 79 equations, 1 figure, 1 table.

Key Result

Proposition 1

If we define $L_s(\mathbf{x}) = \frac{\mathrm{d} \mu_s}{\mathrm{d} p_{\textup{data}}}(\mathbf{x})$, then $\mathrm{d} L_s(\mathbf{x}) = L_s(\mathbf{x}) (\mathbf{x} - \mathbf{a}_s) \cdot \mathrm{d} W'_s$ for all $s \geq 0$.

Figures (1)

  • Figure 1: Illustration of a typical choice of step sizes satisfying $\gamma_k \leq \kappa \min\{1, T-t_{k+1}\}$.

Theorems & Definitions (29)

  • Proposition 1: alaoui2022information, Theorem 2
  • Proposition 2: eldan2020taming, Equation 11
  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Lemma 2: Bound on discretization error
  • proof
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • ...and 19 more