Table of Contents
Fetching ...

Blind denoising diffusion models and the blessings of dimensionality

Zahra Kadkhodaie, Aram-Alexandre Pooladian, Sinho Chewi, Eero Simoncelli

TL;DR

This paper analyzes blind denoising diffusion models (BDDMs) and shows that when data have low intrinsic dimensionality, the blind denoiser can implicitly recover the noise schedule and sample from the data distribution in polynomial time relative to the intrinsic dimension k. It provides a rigorous Bayesian interpretation, derives an implicit schedule σ_t^2 = σ_0^2 e^{−2t} + 2∫_0^t a_s e^{−2(t−s)} ds, and demonstrates that careful discretization (e.g., exponential Euler with a_t = 𝒶 σ_t^2) yields stability and favorable error bounds that depend on k rather than ambient dimension d. Empirically, BDMMs accurately estimate noise variance from a single noisy image and produce higher-quality samples than non-blind baselines, with demonstrations on synthetic Gaussian mixtures and real image datasets (CelebA and LSUN Bedroom). The findings suggest that BDDMs can simplify training and sampling by removing the need for explicit noise-conditioning and can benefit applications requiring robust, perceptually faithful generation and inverse-problem solving.

Abstract

We analyze, theoretically and empirically, the performance of generative diffusion models based on \emph{blind denoisers}, in which the denoiser is not given the noise amplitude in either the training or sampling processes. Assuming that the data distribution has low intrinsic dimensionality, we prove that blind denoising diffusion models (BDDMs), despite not having access to the noise amplitude, \emph{automatically} track a particular \emph{implicit} noise schedule along the reverse process. Our analysis shows that BDDMs can accurately sample from the data distribution in polynomially many steps as a function of the intrinsic dimension. Empirical results corroborate these mathematical findings on both synthetic and image data, demonstrating that the noise variance is accurately estimated from the noisy image. Remarkably, we observe that schedule-free BDDMs produce samples of higher quality compared to their non-blind counterparts. We provide evidence that this performance gain arises because BDDMs correct the mismatch between the true residual noise (of the image) and the noise assumed by the schedule used in non-blind diffusion models.

Blind denoising diffusion models and the blessings of dimensionality

TL;DR

This paper analyzes blind denoising diffusion models (BDDMs) and shows that when data have low intrinsic dimensionality, the blind denoiser can implicitly recover the noise schedule and sample from the data distribution in polynomial time relative to the intrinsic dimension k. It provides a rigorous Bayesian interpretation, derives an implicit schedule σ_t^2 = σ_0^2 e^{−2t} + 2∫_0^t a_s e^{−2(t−s)} ds, and demonstrates that careful discretization (e.g., exponential Euler with a_t = 𝒶 σ_t^2) yields stability and favorable error bounds that depend on k rather than ambient dimension d. Empirically, BDMMs accurately estimate noise variance from a single noisy image and produce higher-quality samples than non-blind baselines, with demonstrations on synthetic Gaussian mixtures and real image datasets (CelebA and LSUN Bedroom). The findings suggest that BDDMs can simplify training and sampling by removing the need for explicit noise-conditioning and can benefit applications requiring robust, perceptually faithful generation and inverse-problem solving.

Abstract

We analyze, theoretically and empirically, the performance of generative diffusion models based on \emph{blind denoisers}, in which the denoiser is not given the noise amplitude in either the training or sampling processes. Assuming that the data distribution has low intrinsic dimensionality, we prove that blind denoising diffusion models (BDDMs), despite not having access to the noise amplitude, \emph{automatically} track a particular \emph{implicit} noise schedule along the reverse process. Our analysis shows that BDDMs can accurately sample from the data distribution in polynomially many steps as a function of the intrinsic dimension. Empirical results corroborate these mathematical findings on both synthetic and image data, demonstrating that the noise variance is accurately estimated from the noisy image. Remarkably, we observe that schedule-free BDDMs produce samples of higher quality compared to their non-blind counterparts. We provide evidence that this performance gain arises because BDDMs correct the mismatch between the true residual noise (of the image) and the noise assumed by the schedule used in non-blind diffusion models.
Paper Structure (45 sections, 11 theorems, 133 equations, 12 figures, 3 algorithms)

This paper contains 45 sections, 11 theorems, 133 equations, 12 figures, 3 algorithms.

Key Result

Lemma 3.1

If $t\mapsto a_t$ is decreasing and $a_0 \le \sigma_0^2$, then $t\mapsto \sigma_t$ is decreasing.

Figures (12)

  • Figure 1: Empirical density of maximum likelihood noise level estimates $\widehat{\sigma} = \arg\max \mu(\sigma|y)$ for a mixture of two Gaussian with intrinsic dimensionality $\mathsf{k}$ = $2$ in an ambient space of dimensionality $d$. Estimates are broadly distributed when $d = \mathsf{k}$(left), but highly concentrated when $d \gg \mathsf{k}^2$(right).
  • Figure 2: Example trajectories and samples for an analytical blind denoiser applied to a mixture of two Gaussians with $\mathsf{k} = 2$. For $d=\mathsf{k}=2$(left), sampling fails, due to errors in the MLE estimates of noise level. For $d=500 \gg \mathsf{k}^2$(right), sampling is successful, illustrating the blessings of dimensionality.
  • Figure 3: Sampling performance of BDDMs trained on Gaussian data with intrinsic dimension $\mathsf{k} =2$ and two different input dimensions $d \in \{2,100\}$. Generated samples (left) and evolution of the estimated noise level, corresponding to Proposition \ref{['prop:opt_schedule']}(right).
  • Figure 4: Top panel. Comparison of denoising performance of blind and non-blind denoisers. Performance is evaluated on the test sets of CelebA and Bedroom class of LSUN datasets, and is reported in terms of PSNR. For consistency, the noise level on the horizontal axis is also expressed as ${\rm PSNR}(x,x_\sigma)$. Bottom panel. An example test image with corresponding PSNR values. Top: Noisy images. Middle: Denoised image by a non-blind denoiser. Bottom: Denoised image by a blind denoiser.
  • Figure 5: Comparison of samples from BDDM and VE-DDPM. Top row: Randomly selected subset of training images from the celebA dataset. Second row: Samples generated by BDDM with $N \approx100$. Third row: Samples generated by a non-blind DDM (VE-DDPM) with $N=100$. Samples in each column are initialized with the same random seed, and use matched injected noise. Seeds are random and not curated for quality.
  • ...and 7 more figures

Theorems & Definitions (23)

  • Lemma 3.1
  • Lemma 3.2
  • Definition 3.3: Intrinsic dimension
  • Lemma 3.4
  • Remark 3.5
  • Proposition B.1
  • proof
  • proof : Proof of Claim 1.
  • proof : Proof of Claim 2.
  • Lemma C.1
  • ...and 13 more