Understanding Generalization in Diffusion Models via Probability Flow Distance
Huijie Zhang, Zijian Huang, Siyi Chen, Jinfan Zhou, Zekai Zhang, Peng Wang, Qing Qu
TL;DR
This paper introduces Probability Flow Distance (PFD), a theoretically grounded and computationally efficient metric for measuring distributional generalization in diffusion models, grounded in the backward PF-ODE noise-to-data mapping. Using a teacher–student protocol, it quantitatively separates memorization from generalization and reveals a scaling law where the memorization-to-generalization transition aligns with the ratio $N / \sqrt{|m{\theta}|}$, along with early learning and double descent dynamics and a bias–variance decomposition of generalization error. The findings illuminate how model capacity and data interact in diffusion models, and demonstrate that PFD can reliably assess generalization beyond generation quality metrics like FID. This framework lays a foundation for principled empirical and theoretical studies of generalization in diffusion models and suggests directions for extending the approach to other generative paradigms.
Abstract
Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality samples that generalize beyond the training data. However, evaluating this generalization remains challenging: theoretical metrics are often impractical for high-dimensional data, while no practical metrics rigorously measure generalization. In this work, we bridge this gap by introducing probability flow distance ($\texttt{PFD}$), a theoretically grounded and computationally efficient metric to measure distributional generalization. Specifically, $\texttt{PFD}$ quantifies the distance between distributions by comparing their noise-to-data mappings induced by the probability flow ODE. Moreover, by using $\texttt{PFD}$ under a teacher-student evaluation protocol, we empirically uncover several key generalization behaviors in diffusion models, including: (1) scaling behavior from memorization to generalization, (2) early learning and double descent training dynamics, and (3) bias-variance decomposition. Beyond these insights, our work lays a foundation for future empirical and theoretical studies on generalization in diffusion models.
