Table of Contents
Fetching ...

Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive

Tyler Farghly, Peter Potaptchik, Samuel Howard, George Deligiannidis, Jakiw Pidstrigach

TL;DR

The paper investigates diffusion models through the lens of the manifold hypothesis and argues that smoothing the score in the log-density domain acts as geometry-adaptive regularisation. It establishes that log-domain smoothing aligns with manifold structure, proving exact equivalence to manifold-adapted smoothing in the linear (affine) setting and providing Rényi-divergence bounds for curved manifolds. The theory is complemented by high-dimensional experiments in latent and pixel spaces, showing that score-smoothed diffusion preserves or distributes mass along data manifolds and can interpolate along geometries consistent with the underlying structure. A key takeaway is that the smoothing kernel induces a geometric bias, enabling controlled generalisation along chosen manifolds, with practical implications for generation quality and diversity. The work also discusses limitations and future directions, such as relaxing assumptions about kernels and curvature and exploring architectural influences on smoothing behavior.

Abstract

Diffusion models have achieved state-of-the-art performance, demonstrating remarkable generalisation capabilities across diverse domains. However, the mechanisms underpinning these strong capabilities remain only partially understood. A leading conjecture, based on the manifold hypothesis, attributes this success to their ability to adapt to low-dimensional geometric structure within the data. This work provides evidence for this conjecture, focusing on how such phenomena could result from the formulation of the learning problem through score matching. We inspect the role of implicit regularisation by investigating the effect of smoothing minimisers of the empirical score matching objective. Our theoretical and empirical results confirm that smoothing the score function -- or equivalently, smoothing in the log-density domain -- produces smoothing tangential to the data manifold. In addition, we show that the manifold along which the diffusion model generalises can be controlled by choosing an appropriate smoothing.

Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive

TL;DR

The paper investigates diffusion models through the lens of the manifold hypothesis and argues that smoothing the score in the log-density domain acts as geometry-adaptive regularisation. It establishes that log-domain smoothing aligns with manifold structure, proving exact equivalence to manifold-adapted smoothing in the linear (affine) setting and providing Rényi-divergence bounds for curved manifolds. The theory is complemented by high-dimensional experiments in latent and pixel spaces, showing that score-smoothed diffusion preserves or distributes mass along data manifolds and can interpolate along geometries consistent with the underlying structure. A key takeaway is that the smoothing kernel induces a geometric bias, enabling controlled generalisation along chosen manifolds, with practical implications for generation quality and diversity. The work also discusses limitations and future directions, such as relaxing assumptions about kernels and curvature and exploring architectural influences on smoothing behavior.

Abstract

Diffusion models have achieved state-of-the-art performance, demonstrating remarkable generalisation capabilities across diverse domains. However, the mechanisms underpinning these strong capabilities remain only partially understood. A leading conjecture, based on the manifold hypothesis, attributes this success to their ability to adapt to low-dimensional geometric structure within the data. This work provides evidence for this conjecture, focusing on how such phenomena could result from the formulation of the learning problem through score matching. We inspect the role of implicit regularisation by investigating the effect of smoothing minimisers of the empirical score matching objective. Our theoretical and empirical results confirm that smoothing the score function -- or equivalently, smoothing in the log-density domain -- produces smoothing tangential to the data manifold. In addition, we show that the manifold along which the diffusion model generalises can be controlled by choosing an appropriate smoothing.

Paper Structure

This paper contains 57 sections, 16 theorems, 160 equations, 19 figures.

Key Result

Proposition 3.1

The log-domain smoothed density satisfies the property, where $P := I-A^T A$ is the projection onto $\text{Null}(A) = \{x \in \mathbb{R}^d: Ax = 0\}$.

Figures (19)

  • Figure 1: Isotropic smoothing of the score function identifies manifold structure. The figure shows training data ($\textcolor{red}{\blacktriangle}$) against generated samples ($\textcolor{MidnightBlue}{\bullet}$) from a diffusion model that is run with the smoothed score $\nabla \log \hat{p}_t \ast \mathcal{N}_{\sigma}$, where the width of the Gaussian smoothing kernel increases from $\sigma = 0.02$ to $\sigma = 0.12$. Notice that for low amounts of smoothing, generated samples are concentrated close to training data and as $\sigma$ increases, generated samples begin to fill out more of the manifold without having seen training samples in those regions.
  • Figure 2: Density smoothing generates samples off-manifold, whereas score smoothing generates samples that retain manifold structure. Left: The plots compare samples ($\textcolor{blue}{\bullet}$) drawn from a KDE (top) versus from a diffusion model with the smoothed score (bottom) from Figure \ref{['fig:lima_bean']} (training data is $\textcolor{red}{\bullet}$). The scale of the smoothing kernel increases from left to right. Right: 1D intuition for data-domain versus log-domain smoothing. The left sub-figure shows the Gaussian ($\textcolor{blue}{-}$) smoothed in data-domain ($\textcolor{orange}{-}$), and the right sub-figure shows the Gaussian smoothed in log-domain ($\textcolor{darkgreen}{-}$) with the same kernel.
  • Figure 3: The choice of smoothing kernel influences the manifold on which generated samples lie. The empirical score function corresponding to the training data ($\textcolor{red}{\blacktriangle}$) is smoothed with different (data-dependent) kernels. To visualize the smoothing kernels, we generate samples ($\textcolor{LimeGreen}{\bullet}$) from $k_x$. We use the smoothed score functions to generate samples ($\textcolor{MidnightBlue}{\bullet}$) from the resulting diffusion models. Notice that despite using the same training data, different smoothing kernels generate samples that lie on different manifolds.
  • Figure 4: Score smoothing can promote generalisation along curved manifolds, but too much smoothing can distort the desired structure. Left: Training data ($\textcolor{red}{\blacktriangle}$) against generated samples ($\textcolor{MidnightBlue}{\bullet}$) using isotropic Gaussian score smoothing with variance $\sigma^2$. Right: Corresponding population negative log-likelihood, calculated for 1000 points on the true circular manifold. See Appendix \ref{['app:circle_details']} for details.
  • Figure 5: Different smoothing kernels can isolate alternative manifolds, given the same training data. Training data ($\textcolor{red}{\blacktriangle}$) against generated samples ($\textcolor{MidnightBlue}{\bullet}$) using isotropic Gaussian score smoothing. By changing the smoothing variance $\sigma^2$, different geometries are realised.
  • ...and 14 more figures

Theorems & Definitions (27)

  • Proposition 3.1
  • Theorem 3.6
  • Corollary 3.7
  • Proposition 3.8
  • Theorem 4.1
  • proof : Proof of Proposition \ref{['prop:linear_result']}
  • Lemma C.1
  • proof
  • Definition C.2
  • Lemma C.3
  • ...and 17 more