Diffusion Models are Minimax Optimal Distribution Estimators

Kazusato Oko; Shunta Akiyama; Taiji Suzuki

Diffusion Models are Minimax Optimal Distribution Estimators

Kazusato Oko, Shunta Akiyama, Taiji Suzuki

TL;DR

This work establishes a statistical learning theory for diffusion-based distribution estimation by proving nearly minimax estimation rates in TV and W1 when the true density lies in a Besov space and the score is learned via neural networks. The authors introduce a diffused B-spline basis to approximate the score and convert approximation error into estimation error, yielding explicit rates and network-size bounds. They further show that diffusion models adapt to intrinsic dimensionality, avoiding the curse of dimensionality under a manifold assumption, and they propose a score-network switching scheme to tighten Wasserstein-rate bounds. Overall, the paper provides rigorous guarantees for diffusion models as distribution estimators and highlights practical strategies for achieving optimal generalization in high-dimensional settings.

Abstract

While efficient distribution learning is no doubt behind the groundbreaking success of diffusion modeling, its theoretical guarantees are quite limited. In this paper, we provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling for well-known function spaces. The highlight of this paper is that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates in the total variation distance and in the Wasserstein distance of order one. Furthermore, we extend our theory to demonstrate how diffusion models adapt to low-dimensional data distributions. We expect these results advance theoretical understandings of diffusion modeling and its ability to generate verisimilar outputs.

Diffusion Models are Minimax Optimal Distribution Estimators

TL;DR

Abstract

Paper Structure (80 sections, 101 theorems, 591 equations)

This paper contains 80 sections, 101 theorems, 591 equations.

Introduction
On the effect of score approximation errors
Generalization error analyses
Our contributions
Other related works
Preliminaries
Diffusion modeling
Score matching
Class of neural networks
Density estimation in the Besov space
Assumptions
Approximation of the true score
Proof overview
Approximation via the diffused B-spline Basis
Utilizing the smoothness induced by the noise
...and 65 more sections

Key Result

Theorem 3.1

There exists a neural network $\phi_{{\rm score}}\in \Phi(L,W,S,B)$ that satisfies, for all $t\in [\underline{T},\overline{T}]$, Here, $L,W,S$ and $B$ are evaluated as $L = \mathcal{O} (\log^4 N),\| W\|_\infty = \mathcal{O} (N\log^6N),S = \mathcal{O} (N\log^8N),$ and $B = \exp(\mathcal{O}(\log^4 N )).$ Moreover, we can take $\phi_{{\rm score}}$ satisfying $\|\phi_{{\rm score}}(\cdot,t)\|_\infty =

Theorems & Definitions (189)

Definition 2.1
Definition 2.2
Definition 2.3: Besov space $B_{p,q}^s(\Omega)$
Theorem 3.1
Lemma 3.2: Informal version of \ref{['Lemma:SuzukiBesov']}; suzuki2018adaptivity
Lemma 3.3: See also \ref{['Lemma:MandSigma']}
Lemma 3.4
Lemma 3.5
Lemma 3.6
Lemma 4.1
...and 179 more

Diffusion Models are Minimax Optimal Distribution Estimators

TL;DR

Abstract

Diffusion Models are Minimax Optimal Distribution Estimators

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (189)