Table of Contents
Fetching ...

To smooth a cloud or to pin it down: Guarantees and Insights on Score Matching in Denoising Diffusion Models

Francisco Vargas, Teodora Reu, Anna Kerekes, Michael M Bronstein

TL;DR

This work develops a stochastic-control perspective on score matching for denoising diffusion models, linking VP-SDE scores to the OU semigroup and leveraging Föllmer drift insights to obtain neural-network expressiveness guarantees for score approximation. It provides rigorous regularity results and entropy bounds for the OU-based framework, enabling controlled approximation errors and sampling guarantees. The authors contrast VP-SDE with pinned Brownian motion through comprehensive simulations on synthetic and image data, showing VP-SDE can achieve better score estimation and sampling with the same network budget. Overall, the paper advances theoretical understanding and practical performance of continuous-time diffusion samplers, with potential impact on scalable, accurate density estimation and generative modelling.

Abstract

Denoising diffusion models are a class of generative models which have recently achieved state-of-the-art results across many domains. Gradual noise is added to the data using a diffusion process, which transforms the data distribution into a Gaussian. Samples from the generative model are then obtained by simulating an approximation of the time reversal of this diffusion initialized by Gaussian samples. Recent research has explored adapting diffusion models for sampling and inference tasks. In this paper, we leverage known connections to stochastic control akin to the Föllmer drift to extend established neural network approximation results for the Föllmer drift to denoising diffusion models and samplers.

To smooth a cloud or to pin it down: Guarantees and Insights on Score Matching in Denoising Diffusion Models

TL;DR

This work develops a stochastic-control perspective on score matching for denoising diffusion models, linking VP-SDE scores to the OU semigroup and leveraging Föllmer drift insights to obtain neural-network expressiveness guarantees for score approximation. It provides rigorous regularity results and entropy bounds for the OU-based framework, enabling controlled approximation errors and sampling guarantees. The authors contrast VP-SDE with pinned Brownian motion through comprehensive simulations on synthetic and image data, showing VP-SDE can achieve better score estimation and sampling with the same network budget. Overall, the paper advances theoretical understanding and practical performance of continuous-time diffusion samplers, with potential impact on scalable, accurate density estimation and generative modelling.

Abstract

Denoising diffusion models are a class of generative models which have recently achieved state-of-the-art results across many domains. Gradual noise is added to the data using a diffusion process, which transforms the data distribution into a Gaussian. Samples from the generative model are then obtained by simulating an approximation of the time reversal of this diffusion initialized by Gaussian samples. Recent research has explored adapting diffusion models for sampling and inference tasks. In this paper, we leverage known connections to stochastic control akin to the Föllmer drift to extend established neural network approximation results for the Föllmer drift to denoising diffusion models and samplers.
Paper Structure (38 sections, 13 theorems, 118 equations, 8 figures, 1 table)

This paper contains 38 sections, 13 theorems, 118 equations, 8 figures, 1 table.

Key Result

Proposition 3.1

Suppose Assumptions in Appendix assump are in force. Let L denote the maximum of the Lipschitz constants of $f$ and $\nabla f$. Then for all $0< \epsilon < 16L^2/c^2$, there exists a neural net $\hat{v} : \mathbb{R}^d \times [0,1] \to \mathbb{R}^d$ with size polynomial in $1/\epsilon, d, L, c, 1/c$ with $x_0 \sim p_1 = {\mathrm{Law}}(y_1) \approx {\mathcal{N}}(0, I)$ with the drift $\hat{b}(x,t)

Figures (8)

  • Figure 1: a) Noise-adding process for exact reversal. The distribution ${\mathcal{N}}(0,I)$ is drawn for comparison to $p_T$. b)Exact and approximate time reversal starting from ${\mathcal{N}}(0,I)$ the former exhibits only the mixing error whilst the latter incorporates the network's approximation error.
  • Figure 2: Comparison of distances between $\pi$ and $p^{\mathrm{model}}_\theta$ at time $T$ over $50$-dimensional Funnel and GMM-10 datasets.
  • Figure 3: Comparison of distances between $\pi$ and $p^{\mathrm{model}}_\theta$ at time $T$ over Funnel and Mixed Gaussians varying in dimensions.
  • Figure 4: Samples (2D slice $(x_1, x_0)$) from PBM and VP trained on various sizes of the Funnel distribution. In the background probability density of the Funnel distribution.
  • Figure 5: MNIST samples
  • ...and 3 more figures

Theorems & Definitions (33)

  • Proposition 3.1
  • Remark 3.2
  • Definition 3.3
  • Definition 3.4
  • Remark 3.5
  • Lemma 3.6
  • Lemma 3.7
  • Lemma 3.8
  • Lemma 3.9
  • Lemma 3.10
  • ...and 23 more