From optimal score matching to optimal sampling

Zehao Dou; Subhodh Kotekal; Zhehao Xu; Harrison H. Zhou

From optimal score matching to optimal sampling

Zehao Dou, Subhodh Kotekal, Zhehao Xu, Harrison H. Zhou

TL;DR

This work derives the sharp minimax rates for score estimation in score-based diffusion models under a variance-exploding forward SDE, showing the rate $\inf_{\hat s}\sup_{f\in\mathcal F_\alpha} \mathbb E\int |\hat s(x,t)-s(x,t)|^2 p(x,t) dx$ is bounded by $C\min\{1/(nt^2), 1/(nt^{3/2}), t^{α-1} + n^{-2(α-1)/(2α+1)}\}$. It develops regime-specific estimators (regularized plug-in in very high noise, unbiased estimators in high noise, and kernel-based methods in low noise) and proves matching lower bounds, establishing diffusion-model sampling as minimax-optimal without extraneous logarithmic factors or early stopping for $α$-Hölder densities. The paper further shows the diffusion-based estimator achieves the sharp minimax density-estimation rate $\mathbb E\mathrm{TV}(\hat f,f)^2 \lesssim n^{-2α/(2α+1)}$, with dimension-dependent extensions for multivariate data and corresponding $\mathrm{W_1}$ bounds. Overall, the results demonstrate that score-based diffusion models can be provably optimal for sampling and density estimation, and they provide a principled, regime-aware methodology for score estimation and density reconstruction in both univariate and multivariate settings.

Abstract

The recent, impressive advances in algorithmic generation of high-fidelity image, audio, and video are largely due to great successes in score-based diffusion models. A key implementing step is score matching, that is, the estimation of the score function of the forward diffusion process from training data. As shown in earlier literature, the total variation distance between the law of a sample generated from the trained diffusion model and the ground truth distribution can be controlled by the score matching risk. Despite the widespread use of score-based diffusion models, basic theoretical questions concerning exact optimal statistical rates for score estimation and its application to density estimation remain open. We establish the sharp minimax rate of score estimation for smooth, compactly supported densities. Formally, given $n$ i.i.d. samples from an unknown $α$-Hölder density $f$ supported on $[-1, 1]$, we prove the minimax rate of estimating the score function of the diffused distribution $f * \mathcal{N}(0, t)$ with respect to the score matching loss is $\frac{1}{nt^2} \wedge \frac{1}{nt^{3/2}} \wedge (t^{α-1} + n^{-2(α-1)/(2α+1)})$ for all $α> 0$ and $t \ge 0$. As a consequence, it is shown the law $\hat{f}$ of a sample generated from the diffusion model achieves the sharp minimax rate $\bE(\dTV(\hat{f}, f)^2) \lesssim n^{-2α/(2α+1)}$ for all $α> 0$ without any extraneous logarithmic terms which are prevalent in the literature, and without the need for early stopping which has been required for all existing procedures to the best of our knowledge.

From optimal score matching to optimal sampling

TL;DR

This work derives the sharp minimax rates for score estimation in score-based diffusion models under a variance-exploding forward SDE, showing the rate

is bounded by

. It develops regime-specific estimators (regularized plug-in in very high noise, unbiased estimators in high noise, and kernel-based methods in low noise) and proves matching lower bounds, establishing diffusion-model sampling as minimax-optimal without extraneous logarithmic factors or early stopping for

-Hölder densities. The paper further shows the diffusion-based estimator achieves the sharp minimax density-estimation rate

, with dimension-dependent extensions for multivariate data and corresponding

bounds. Overall, the results demonstrate that score-based diffusion models can be provably optimal for sampling and density estimation, and they provide a principled, regime-aware methodology for score estimation and density reconstruction in both univariate and multivariate settings.

Abstract

i.i.d. samples from an unknown

-Hölder density

supported on

, we prove the minimax rate of estimating the score function of the diffused distribution $f * \mathcal{N}(0, t)$ with respect to the score matching loss is $\frac{1}{nt^2} \wedge \frac{1}{nt^{3/2}} \wedge (t^{α-1} + n^{-2(α-1)/(2α+1)})$ for all

and

. As a consequence, it is shown the law

of a sample generated from the diffusion model achieves the sharp minimax rate $\bE(\dTV(\hat{f}, f)^2) \lesssim n^{-2α/(2α+1)}$ for all

without any extraneous logarithmic terms which are prevalent in the literature, and without the need for early stopping which has been required for all existing procedures to the best of our knowledge.

Paper Structure (44 sections, 47 theorems, 337 equations, 1 table, 1 algorithm)

This paper contains 44 sections, 47 theorems, 337 equations, 1 table, 1 algorithm.

Introduction
Background on diffusion models
Related work
Main contributions
Methodology
Very high noise regime
High noise regime
Low noise regime
Distribution estimation
Main results
Score estimation upper bound
Very high noise regime
High noise regime
Low noise regime
Score estimation lower bound
...and 29 more sections

Key Result

Theorem 1

Let $\alpha > 0$. There exists a constant $C = C(\alpha, L)$ depending only on $\alpha$ and $L$ such that for all $t \ge 0$.

Theorems & Definitions (85)

Remark 1
Remark 2
Remark 3
Remark 4
Theorem 1
Theorem 2
Theorem 3
Theorem 4
Theorem 5
Theorem 6
...and 75 more

From optimal score matching to optimal sampling

TL;DR

Abstract

From optimal score matching to optimal sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (85)