Table of Contents
Fetching ...

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Mojtaba Sahraee-Ardakan, Mauricio Delbracio, Peyman Milanfar

TL;DR

It is proved that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift, and establish the structural stability conditions for sampling with autonomous models.

Abstract

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

TL;DR

It is proved that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift, and establish the structural stability conditions for sampling with autonomous models.

Abstract

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, , where is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.
Paper Structure (44 sections, 7 theorems, 72 equations, 5 figures, 3 tables)

This paper contains 44 sections, 7 theorems, 72 equations, 5 figures, 3 tables.

Key Result

Lemma 1

Consider the loss functional $\mathcal{L}(f)$ defined in Eq eq:mse_loss. The unique global minimizer $f^*(\mathbf{u})$ is given by the expectation of the target conditioned on the noise level $t$, weighted by the posterior $p(t|\mathbf{u})$:

Figures (5)

  • Figure 1: The Singular Geometry of the Marginal Energy Landscape. (Left) 3D Energy Landscape: A visualization of the marginal energy $E_{marg}(u) = -\log p(u)$. The landscape reveals an infinitely deep potential well at the data manifold, where the energy diverges to $-\infty$. (Right) Contour View: Top-down perspective showing the energy concentration around discrete data points (stars). While the raw gradient $\nabla_u E_{marg}(u)$ becomes singular as $u$ approaches the clean data, in this work we prove that autonomous models remain stable by implicitly implementing a Riemannian gradient flow. In this framework, the posterior noise variance acts as a local conformal metric that preconditions and perfectly counteracts the geometric singularity.
  • Figure 2: Generative performance on CIFAR-10. Top: DDPM Blind exhibits structural instability and noise. Bottom: Flow Matching Blind achieves stable generation, matching the performance of conditioned models.
  • Figure 3: Generative performance on SVHN (Street View House Numbers). Top: DDPM Blind exhibits structural instability and noise. Bottom: Flow Matching Blind achieves stable generation, matching the performance of conditioned models.
  • Figure 4: Generative performance on Fashion MNIST.. Top: DDPM Blind exhibits structural instability and noise. Bottom: Flow Matching Blind achieves stable generation, matching the performance of conditioned models.
  • Figure 5: Generative performance on a 2D concentric circles dataset embedded in $\mathbb{R}^D$. Conditional models receive explicit time embeddings, while autonomous (blind) models must implicitly infer the noise scale. At low dimensions ($D=2$, top), blind models fail due to overlapping noise distributions. At moderate dimensions ($D=8, 32$, middle), Flow Matching achieves stable generation due to its bounded velocity parameterization, while DDPM Blind exhibits severe noise artifacts due to its singular gain $O(1/b(t))$. At extreme dimensions ($D=128$, bottom), absolute geometric concentration forces the estimation error to zero, allowing both blind models to converge.

Theorems & Definitions (14)

  • Lemma 1: Optimal Autonomous Target
  • proof
  • Lemma 2: Denoiser Formulation
  • proof
  • Lemma 3: Gradient of the Marginal Energy
  • proof
  • Lemma 4: Concentration of the Inverse-Gamma Kernel
  • proof
  • Lemma 5
  • proof
  • ...and 4 more