Table of Contents
Fetching ...

Optimal Denoising in Score-Based Generative Models: The Role of Data Regularity

Eliot Beyler, Francis Bach

TL;DR

The paper investigates one-step denoising in score-based generative models, contrasting full-denoising with half-denoising and examining how data regularity vs. manifold structure affects performance. It develops rigorous bounds showing half-denoising achieves $O(\sigma^4)$ accuracy in both MMD and $W_2$ distances under regular densities, while full-denoising yields $O(\sigma^2)$ accuracy and can outperform in singular settings such as Dirac mixtures or low-dimensional supports. The analysis extends to subspace scenarios and mixtures, revealing a trade-off between correcting to the subspace and reducing distortion on the subspace itself; in particular, full-denoising can alleviate the curse of dimensionality when the data lie on low-dimensional linear structures. The results offer practical guidance for single-step and multi-step diffusion-model design (e.g., DDIM and related samplers), suggesting that denoiser choice should be adapted to the target density’s regularity and subspace geometry, with potential extensions to linear-manifold and more general manifold settings.

Abstract

Score-based generative models achieve state-of-the-art sampling performance by denoising a distribution perturbed by Gaussian noise. In this paper, we focus on a single deterministic denoising step, and compare the optimal denoiser for the quadratic loss, we name ''full-denoising'', to the alternative ''half-denoising'' introduced by Hyv{ä}rinen (2024). We show that looking at the performances in term of distance between distribution tells a more nuanced story, with different assumptions on the data leading to very different conclusions. We prove that half-denoising is better than full-denoising for regular enough densities, while full-denoising is better for singular densities such as mixtures of Dirac measures or densities supported on a low-dimensional subspace. In the latter case, we prove that full-denoising can alleviate the curse of dimensionality under a linear manifold hypothesis.

Optimal Denoising in Score-Based Generative Models: The Role of Data Regularity

TL;DR

The paper investigates one-step denoising in score-based generative models, contrasting full-denoising with half-denoising and examining how data regularity vs. manifold structure affects performance. It develops rigorous bounds showing half-denoising achieves accuracy in both MMD and distances under regular densities, while full-denoising yields accuracy and can outperform in singular settings such as Dirac mixtures or low-dimensional supports. The analysis extends to subspace scenarios and mixtures, revealing a trade-off between correcting to the subspace and reducing distortion on the subspace itself; in particular, full-denoising can alleviate the curse of dimensionality when the data lie on low-dimensional linear structures. The results offer practical guidance for single-step and multi-step diffusion-model design (e.g., DDIM and related samplers), suggesting that denoiser choice should be adapted to the target density’s regularity and subspace geometry, with potential extensions to linear-manifold and more general manifold settings.

Abstract

Score-based generative models achieve state-of-the-art sampling performance by denoising a distribution perturbed by Gaussian noise. In this paper, we focus on a single deterministic denoising step, and compare the optimal denoiser for the quadratic loss, we name ''full-denoising'', to the alternative ''half-denoising'' introduced by Hyv{ä}rinen (2024). We show that looking at the performances in term of distance between distribution tells a more nuanced story, with different assumptions on the data leading to very different conclusions. We prove that half-denoising is better than full-denoising for regular enough densities, while full-denoising is better for singular densities such as mixtures of Dirac measures or densities supported on a low-dimensional subspace. In the latter case, we prove that full-denoising can alleviate the curse of dimensionality under a linear manifold hypothesis.

Paper Structure

This paper contains 26 sections, 14 theorems, 172 equations, 5 figures.

Key Result

Proposition 1

Assume that $\mathbb{E}[\Vert \nabla \log p_X(X)\Vert^2]\leq C$. Then for all $\alpha\in\mathbb{R}$, and furthermore, for $\alpha = \frac{1}{2}$,

Figures (5)

  • Figure 1: Wasserstein distances for Gaussian distributions at different levels of noise. Left: linear scale, right: logarithmic scale.
  • Figure 2: Example of a smooth density with a compact support.
  • Figure 3: Wasserstein distances for Gaussian distribution supported on the subspace $\mathbb{R}^m\times\{0\}^{d-m}$ as target distribution and different noise levels $\sigma$.
  • Figure 4: Wasserstein distances for a mixture of two Dirac measures $\frac{\delta_{-\mu} + \delta_\mu}{2}$ as target distribution and different noise levels $\sigma$.
  • Figure 5: Illustration of the linear manifold hypothesis (a -- thick line), and the manifold hypothesis (b -- fine line).

Theorems & Definitions (14)

  • Proposition 1
  • Corollary 2
  • Proposition 3
  • Lemma 4
  • Proposition 5
  • Proposition 6
  • Proposition 7
  • Proposition 8: Fokker-Planck equation
  • Proposition 9
  • Lemma 10
  • ...and 4 more