Table of Contents
Fetching ...

A Geometric Perspective on Diffusion Models

Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang

TL;DR

This work provides a geometric lens on diffusion models, focusing on the variance-exploding SDE (VE-SDE) and its probability-flow ODE, to reveal two coupled trajectories that govern sampling: a quasi-linear sampling path connecting data and noise, and an implicit denoising trajectory that converges faster. It shows that second-order samplers arise as finite differences of the denoising trajectory and establishes a theoretical link between optimal ODE-based sampling and annealed mean shift, yielding monotone increases in sample likelihood under mild conditions. The analysis yields practical insights, including the ODE-Jump strategy, and explains why modest score deviation can preserve generative ability while mitigating mode collapse. By leveraging a change-of-variables view, the results extend to other SDE families and inform fast sampling, distillation-based methods, and latent interpolation. Overall, the paper deepens the understanding of diffusion dynamics and offers actionable directions for faster, more reliable generation.

Abstract

Recent years have witnessed significant progress in developing effective training and fast sampling techniques for diffusion models. A remarkable advancement is the use of stochastic differential equations (SDEs) and their marginal-preserving ordinary differential equations (ODEs) to describe data perturbation and generative modeling in a unified framework. In this paper, we carefully inspect the ODE-based sampling of a popular variance-exploding SDE and reveal several intriguing structures of its sampling dynamics. We discover that the data distribution and the noise distribution are smoothly connected with a quasi-linear sampling trajectory and another implicit denoising trajectory that even converges faster. Meanwhile, the denoising trajectory governs the curvature of the corresponding sampling trajectory and its finite differences yield various second-order samplers used in practice. Furthermore, we establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm, with which we can characterize the asymptotic behavior of diffusion models and identify the empirical score deviation. Code is available at \url{https://github.com/zju-pi/diff-sampler}.

A Geometric Perspective on Diffusion Models

TL;DR

This work provides a geometric lens on diffusion models, focusing on the variance-exploding SDE (VE-SDE) and its probability-flow ODE, to reveal two coupled trajectories that govern sampling: a quasi-linear sampling path connecting data and noise, and an implicit denoising trajectory that converges faster. It shows that second-order samplers arise as finite differences of the denoising trajectory and establishes a theoretical link between optimal ODE-based sampling and annealed mean shift, yielding monotone increases in sample likelihood under mild conditions. The analysis yields practical insights, including the ODE-Jump strategy, and explains why modest score deviation can preserve generative ability while mitigating mode collapse. By leveraging a change-of-variables view, the results extend to other SDE families and inform fast sampling, distillation-based methods, and latent interpolation. Overall, the paper deepens the understanding of diffusion dynamics and offers actionable directions for faster, more reliable generation.

Abstract

Recent years have witnessed significant progress in developing effective training and fast sampling techniques for diffusion models. A remarkable advancement is the use of stochastic differential equations (SDEs) and their marginal-preserving ordinary differential equations (ODEs) to describe data perturbation and generative modeling in a unified framework. In this paper, we carefully inspect the ODE-based sampling of a popular variance-exploding SDE and reveal several intriguing structures of its sampling dynamics. We discover that the data distribution and the noise distribution are smoothly connected with a quasi-linear sampling trajectory and another implicit denoising trajectory that even converges faster. Meanwhile, the denoising trajectory governs the curvature of the corresponding sampling trajectory and its finite differences yield various second-order samplers used in practice. Furthermore, we establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm, with which we can characterize the asymptotic behavior of diffusion models and identify the empirical score deviation. Code is available at \url{https://github.com/zju-pi/diff-sampler}.
Paper Structure (31 sections, 16 theorems, 53 equations, 28 figures, 2 tables)

This paper contains 31 sections, 16 theorems, 53 equations, 28 figures, 2 tables.

Key Result

Proposition 1

The denoising output $r_{\boldsymbol{\theta}}(\mathbf{x}; t)$ reflects the prediction made by a single Euler step from any sample $\mathbf{x}$ at any time toward $t=0$ with Eq. (eq:epf_ode).

Figures (28)

  • Figure 1: The geometric picture of ODE-based sampling in diffusion models. An initial sample (from the noise distribution) starts from a big sphere and converges to its final sample (in the data manifold) along a smooth, quasi-linear sampling trajectory. Meanwhile, its denoising output lays in an implicit, smooth denoising trajectory starting from the approximate dataset mean. The denoising output is relatively close to the final sample and converges much faster in terms of visual quality.
  • Figure 2: (a) The sample magnitude ($\ell_2$ norm) expands in the forward process (brown curve) while shrinking in the backward process (gray circles). (b) Each trajectory deviation (red curve) is calculated as $d(\hat{\mathbf{x}}_{t_n}, \left[\hat{\mathbf{x}}_{t_0}\hat{\mathbf{x}}_{t_N}\right])$ or $d(r_{\boldsymbol{\theta}}(\hat{\mathbf{x}}_{t_n}), \left[r_{\boldsymbol{\theta}}(\hat{\mathbf{x}}_{t_1})r_{\boldsymbol{\theta}}(\hat{\mathbf{x}}_{t_N})\right])$, respectively, leading to Observation \ref{['obs:bent']}. The distance (blue curve) between the final sample and generated intermediate samples is calculated as $d(\hat{\mathbf{x}}_{t_n}, \hat{\mathbf{x}}_{t_0})$ or $d(r_{\boldsymbol{\theta}}(\hat{\mathbf{x}}_{t_n}), r_{\boldsymbol{\theta}}(\hat{\mathbf{x}}_{t_1}))$, respectively, leading to Observation \ref{['obs:converge']}.
  • Figure 3: The comparison of visual quality (top is sampling trajectory, bottom is denoising trajectory) and Fréchet Inception Distance (FID heusel2017gans, lower is better) w.r.t. the number of score function evaluations (NFEs). More results are provided in Appendix \ref{['subsec:visual']}. The denoising trajectory converges much faster than the sampling trajectory in terms of FID and visual quality.
  • Figure 4: The likelihoods of $r_{\boldsymbol{\theta}}(\hat{\mathbf{x}}_{t_n})$ and $\hat{\mathbf{x}}_{t_{n-1}}$ are larger than that of $\hat{\mathbf{x}}_{t_n}$. The ratio of $\left\lVert r_{\boldsymbol{\theta}}^{\star}(\hat{\mathbf{x}}_{t_n}) - r_{\boldsymbol{\theta}}(\hat{\mathbf{x}}_{t_n}) \right\rVert$ to $\left\lVert r_{\boldsymbol{\theta}}^{\star}(\hat{\mathbf{x}}_{t_n}) - \hat{\mathbf{x}}_{t_n}\right\rVert$ is consistently lower than one in the sampling trajectory.
  • Figure 5: Top: We visualize a forward diffusion process of a randomly-selected image to obtain its encoding $\hat{\mathbf{x}}_{t_N}$ (first row) and simulate multiple trajectories starting from this encoding (other rows). Bottom: The k-nearest neighbors (k=5) of $\hat{\mathbf{x}}_{t_0}$ and $\hat{\mathbf{x}}_{t_0}^{\star}$ to real samples in the dataset.
  • ...and 23 more figures

Theorems & Definitions (36)

  • Proposition 1
  • proof
  • Proposition 2
  • Proposition 3
  • proof
  • Proposition 4
  • Proposition 5
  • proof
  • Theorem 1
  • Corollary 1
  • ...and 26 more