To smooth a cloud or to pin it down: Guarantees and Insights on Score Matching in Denoising Diffusion Models
Francisco Vargas, Teodora Reu, Anna Kerekes, Michael M Bronstein
TL;DR
This work develops a stochastic-control perspective on score matching for denoising diffusion models, linking VP-SDE scores to the OU semigroup and leveraging Föllmer drift insights to obtain neural-network expressiveness guarantees for score approximation. It provides rigorous regularity results and entropy bounds for the OU-based framework, enabling controlled approximation errors and sampling guarantees. The authors contrast VP-SDE with pinned Brownian motion through comprehensive simulations on synthetic and image data, showing VP-SDE can achieve better score estimation and sampling with the same network budget. Overall, the paper advances theoretical understanding and practical performance of continuous-time diffusion samplers, with potential impact on scalable, accurate density estimation and generative modelling.
Abstract
Denoising diffusion models are a class of generative models which have recently achieved state-of-the-art results across many domains. Gradual noise is added to the data using a diffusion process, which transforms the data distribution into a Gaussian. Samples from the generative model are then obtained by simulating an approximation of the time reversal of this diffusion initialized by Gaussian samples. Recent research has explored adapting diffusion models for sampling and inference tasks. In this paper, we leverage known connections to stochastic control akin to the Föllmer drift to extend established neural network approximation results for the Föllmer drift to denoising diffusion models and samplers.
