Score-based generative models break the curse of dimensionality in learning a family of sub-Gaussian probability distributions
Frank Cole, Yulong Lu
TL;DR
This work provides a theoretical foundation for score-based diffusion models learning sub-Gaussian distributions by linking the log-relative density $f$ to Barron-space function classes. The authors prove that if $f$ can be locally approximated by a neural network with bounded path norm, then the score $\nabla_x \log p_t$ at any fixed time $t$ can be approximated without the curse of dimensionality in $L^2(p_t)$, and that empirical score matching yields TV guarantees for the target distribution. They derive explicit sample-size requirements and show that diffusion-based sampling from Gaussian mixtures can achieve dimension-free performance under these assumptions. The results extend to practical examples, including Barron-function targets and Gaussian mixtures, and they highlight a dimension-free approximation rate for the forward score as a central technical achievement. Collectively, the findings offer a rigorous explanation for the empirical success of SGMs in high dimensions and provide guidance for designing low-complexity target densities in diffusion-based generative modeling.
Abstract
While score-based generative models (SGMs) have achieved remarkable success in enormous image generation tasks, their mathematical foundations are still limited. In this paper, we analyze the approximation and generalization of SGMs in learning a family of sub-Gaussian probability distributions. We introduce a notion of complexity for probability distributions in terms of their relative density with respect to the standard Gaussian measure. We prove that if the log-relative density can be locally approximated by a neural network whose parameters can be suitably bounded, then the distribution generated by empirical score matching approximates the target distribution in total variation with a dimension-independent rate. We illustrate our theory through examples, which include certain mixtures of Gaussians. An essential ingredient of our proof is to derive a dimension-free deep neural network approximation rate for the true score function associated with the forward process, which is interesting in its own right.
