Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions

Iskander Azangulov; George Deligiannidis; Judith Rousseau

Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions

Iskander Azangulov, George Deligiannidis, Judith Rousseau

TL;DR

This work tackles diffusion-models under the manifold hypothesis, showing that score learning can achieve ambient-dimension–free rates by tying diffusion processes to extrema of Gaussian processes. It introduces a dimension-reduction strategy that replaces the high-dimensional manifold with a union of low-dimensional polynomial pieces, enabling neural-network-based score estimation with polylog-sized networks and near-optimal statistical rates. The results yield ambient-dimension–independent score-learning rates and dimension-affected but controlled Wasserstein bounds, along with KL guarantees for sampling, thereby explaining DDPMs’ strong performance on data with intrinsic low-dimensional structure. The framework paves the way for scalable diffusion modeling in very high dimensions and suggests practical NN architectures and manifold-approximation schemes for efficient training and sampling.

Abstract

Denoising Diffusion Probabilistic Models (DDPM) are powerful state-of-the-art methods used to generate synthetic data from high-dimensional data distributions and are widely used for image, audio, and video generation as well as many more applications in science and beyond. The \textit{manifold hypothesis} states that high-dimensional data often lie on lower-dimensional manifolds within the ambient space, and is widely believed to hold in provided examples. While recent results have provided invaluable insight into how diffusion models adapt to the manifold hypothesis, they do not capture the great empirical success of these models, making this a very fruitful research direction. In this work, we study DDPMs under the manifold hypothesis and prove that they achieve rates independent of the ambient dimension in terms of score learning. In terms of sampling complexity, we obtain rates independent of the ambient dimension w.r.t. the Kullback-Leibler divergence, and $O(\sqrt{D})$ w.r.t. the Wasserstein distance. We do this by developing a new framework connecting diffusion models to the well-studied theory of extrema of Gaussian Processes.

Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions

TL;DR

Abstract

w.r.t. the Wasserstein distance. We do this by developing a new framework connecting diffusion models to the well-studied theory of extrema of Gaussian Processes.

Paper Structure (18 sections, 8 theorems, 31 equations)

This paper contains 18 sections, 8 theorems, 31 equations.

Introduction
Related Works
Our Contribution
Preliminaries
Score-Matching Generative Models
Manifold Hypothesis
Notation
Convention 1.
Notation 1.
Notation 2.
Notation 3.
Notation 4.
Approximation of a Score Function in High Dimension
Construction of the estimator
Concentration of the Score Function (\ref{['sec:high_probability_bounds']})
...and 3 more sections

Key Result

proposition 1

For any $\varepsilon < r_0$ there is an $\varepsilon$-dense and $\varepsilon/2$-sparse set $\mathcal{G}=\{G_1,\ldots, G_N\}\subset M \subset \R^D$, moreover ${N = N(\varepsilon) \le (\varepsilon/2)^{-d}\mathop{\mathrm{\mathrm{Vol}}}\nolimits M}$.

Theorems & Definitions (17)

remark 1
proposition 1
proof
definition 1
theorem 1
corollary 1
proof
theorem 2
proof
proposition 2
...and 7 more

Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions

TL;DR

Abstract

Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (17)