Generative diffusion models from a PDE perspective
Fei Cao, Kimball Johnston, Thomas Laurent, Justin Le, Sébastien Motsch
TL;DR
From a PDE perspective, the paper derives the reverse diffusion equation corresponding to the forward Ornstein–Uhlenbeck process and shows that the exact reverse flow preserves the support of the original data $ρ_0$, thus cannot generate new samples. It connects the PDE derivation to the traditional SDE view, presenting the reverse SDE with a time-dependent generator and two equivalent representations: a score-based form involving $∇ \log ρ$ and a mean-field form using $\langle X_0\rangle$. The authors analyze the Dirac initial condition, providing an explicit solvable case that demonstrates how perfect reversal returns original samples (an overfitting regime) and underscore the necessity of imperfect learning for real generalization. They further discuss practical aspects via score-matching and stable diffusion, including kernel interpretations when $ρ_0$ is empirical, and argue that successful generalization relies on the interplay between reverse dynamics and model approximation (e.g., UNets) rather than the PDE reversal alone.
Abstract
Diffusion models have become the de facto framework for generating new datasets. The core of these models lies in the ability to reverse a diffusion process in time. The goal of this manuscript is to explain, from a PDE perspective, how this method works and how to derive the PDE governing the reverse dynamics as well as to study its solution analytically. By linking forward and reverse dynamics, we show that the reverse process's distribution has its support contained within the original distribution. Consequently, diffusion methods, in their analytical formulation, do not inherently regularize the original distribution, and thus, there is no generalization principle. This raises a question: where does generalization arise, given that in practice it does occur? Moreover, we derive an explicit solution to the reverse process's SDE under the assumption that the starting point of the forward process is fixed. This provides a new derivation that links two popular approaches to generative diffusion models: stable diffusion (discrete dynamics) and the score-based approach (continuous dynamics). Finally, we explore the case where the original distribution consists of a finite set of data points. In this scenario, the reverse dynamics are explicit (i.e., the loss function has a clear minimizer), and solving the dynamics fails to generate new samples: the dynamics converge to the original samples. In a sense, solving the minimization problem exactly is "too good for its own good" (i.e., an overfitting regime).
