The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Mojtaba Sahraee-Ardakan; Mauricio Delbracio; Peyman Milanfar

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Mojtaba Sahraee-Ardakan, Mauricio Delbracio, Peyman Milanfar

TL;DR

It is proved that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift, and establish the structural stability conditions for sampling with autonomous models.

Abstract

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

TL;DR

Abstract

, where

is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a

singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.

Paper Structure (44 sections, 7 theorems, 72 equations, 5 figures, 3 tables)

This paper contains 44 sections, 7 theorems, 72 equations, 5 figures, 3 tables.

Introduction
Related Work
Noise-Blind Denoising.
Energy Landscapes & Singularities.
Equilibrium Dynamics & Flow.
Preliminaries: A Unified Schedule Formulation
The Geometry of the Marginal Energy
The Energy Paradox
Case 1: Attractors (EqM, EDM).
Case 2: Transversal Flows (Flow Matching).
The Singularity.
Autonomous Generation as Riemannian Gradient Flow
The Anatomy of the Autonomous Field
The Energy-Aligned Decomposition.
Regime I: Global Concentration in High Dimensions
...and 29 more sections

Key Result

Lemma 1

Consider the loss functional $\mathcal{L}(f)$ defined in Eq eq:mse_loss. The unique global minimizer $f^*(\mathbf{u})$ is given by the expectation of the target conditioned on the noise level $t$, weighted by the posterior $p(t|\mathbf{u})$:

Figures (5)

Figure 1: The Singular Geometry of the Marginal Energy Landscape. (Left) 3D Energy Landscape: A visualization of the marginal energy $E_{marg}(u) = -\log p(u)$. The landscape reveals an infinitely deep potential well at the data manifold, where the energy diverges to $-\infty$. (Right) Contour View: Top-down perspective showing the energy concentration around discrete data points (stars). While the raw gradient $\nabla_u E_{marg}(u)$ becomes singular as $u$ approaches the clean data, in this work we prove that autonomous models remain stable by implicitly implementing a Riemannian gradient flow. In this framework, the posterior noise variance acts as a local conformal metric that preconditions and perfectly counteracts the geometric singularity.
Figure 2: Generative performance on CIFAR-10. Top: DDPM Blind exhibits structural instability and noise. Bottom: Flow Matching Blind achieves stable generation, matching the performance of conditioned models.
Figure 3: Generative performance on SVHN (Street View House Numbers). Top: DDPM Blind exhibits structural instability and noise. Bottom: Flow Matching Blind achieves stable generation, matching the performance of conditioned models.
Figure 4: Generative performance on Fashion MNIST.. Top: DDPM Blind exhibits structural instability and noise. Bottom: Flow Matching Blind achieves stable generation, matching the performance of conditioned models.
Figure 5: Generative performance on a 2D concentric circles dataset embedded in $\mathbb{R}^D$. Conditional models receive explicit time embeddings, while autonomous (blind) models must implicitly infer the noise scale. At low dimensions ($D=2$, top), blind models fail due to overlapping noise distributions. At moderate dimensions ($D=8, 32$, middle), Flow Matching achieves stable generation due to its bounded velocity parameterization, while DDPM Blind exhibits severe noise artifacts due to its singular gain $O(1/b(t))$. At extreme dimensions ($D=128$, bottom), absolute geometric concentration forces the estimation error to zero, allowing both blind models to converge.

Theorems & Definitions (14)

Lemma 1: Optimal Autonomous Target
proof
Lemma 2: Denoiser Formulation
proof
Lemma 3: Gradient of the Marginal Energy
proof
Lemma 4: Concentration of the Inverse-Gamma Kernel
proof
Lemma 5
proof
...and 4 more

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

TL;DR

Abstract

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)