Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Saptarshi Chakraborty; Quentin Berthet; Peter L. Bartlett

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Saptarshi Chakraborty, Quentin Berthet, Peter L. Bartlett

TL;DR

The results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on $d^\ast_{p,q}(\mu)$ rather than the ambient dimension.

Abstract

Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional structure common in real data, such as that arising in natural images. In this work, we study the statistical convergence of score-based diffusion models for learning an unknown distribution $μ$ from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-$p$ distance. Unlike prior results, our guarantees hold for all $p \ge 1$ and require only a finite-moment assumption on $μ$, without compact-support, manifold, or smooth-density conditions. Specifically, given $n$ i.i.d.\ samples from $μ$ with finite $q$-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-$p$ error between the learned distribution $\hatμ$ and $μ$ scales as $\mathbb{E}\, \mathbb{W}_p(\hatμ,μ) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(μ)}\right),$ where $d^\ast_{p,q}(μ)$ is the $(p,q)$-Wasserstein dimension of $μ$. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on $d^\ast_{p,q}(μ)$ rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed $(p,q)$-Wasserstein dimension also extends classical Wasserstein dimension notions to distributions with unbounded support, which may be of independent theoretical interest.

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

TL;DR

The results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on

rather than the ambient dimension.

Abstract

from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-

distance. Unlike prior results, our guarantees hold for all

and require only a finite-moment assumption on

, without compact-support, manifold, or smooth-density conditions. Specifically, given

i.i.d.\ samples from

with finite

-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-

error between the learned distribution

and

scales as

where

is the

-Wasserstein dimension of

. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on

rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed

-Wasserstein dimension also extends classical Wasserstein dimension notions to distributions with unbounded support, which may be of independent theoretical interest.

Paper Structure (49 sections, 35 theorems, 138 equations, 1 figure)

This paper contains 49 sections, 35 theorems, 138 equations, 1 figure.

Introduction
Contributions
Organization
A Proof of Concept Result
Background
Notations
Score Matching Diffusion Models
Forward Process
Reverse Process
Score matching
Intrinsic Data Dimension
Theoretical Analyses
Assumptions
Partition
Main Result
...and 34 more sections

Key Result

Proposition 8

For any probability measure $\mu$ and $0< p < q < \infty$,

Figures (1)

Figure 1: Average generalization error (in terms of FID scores) for different values of $n$ for DDPM. The error bars denote the standard deviation out of $10$ replications.

Theorems & Definitions (70)

Definition 1: Covering and Packing Numbers
Definition 2: Neural networks
Definition 3: Sobolev functions
Definition 4: Wasserstein $p$-distance
Definition 5: Upper and Lower Wasserstein Dimensions weed2019sharp
Definition 6
Definition 7: Regularity dimensions
Definition 8: Upper packing dimension
Proposition 8
Theorem 9
...and 60 more

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

TL;DR

Abstract

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (70)