Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization
Fangzhao Zhang, Mert Pilanci
TL;DR
The work addresses the problem of understanding neural network-based diffusion models by deriving convex reformulations of score matching and denoising score matching for two-layer networks. It shows that training these networks to predict the score function can be performed via convex programs, yielding exact finite-sample predictions and convergence guarantees for Langevin sampling under appropriate conditions. Key contributions include explicit univariate convex formulations with a piecewise-linear score, a dual characterization for multivariate data using arrangement matrices, and practical insights into when the learned score corresponds to Gaussian or Gaussian-mixture distributions. The results provide a non-asymptotic, theoretically grounded view of what two-layer neural score predictors learn and offer potential numerical benefits through convex optimization for diffusion-model training.
Abstract
Diffusion models are gaining widespread use in cutting-edge image, video, and audio generation. Score-based diffusion models stand out among these methods, necessitating the estimation of score function of the input data distribution. In this study, we present a theoretical framework to analyze two-layer neural network-based diffusion models by reframing score matching and denoising score matching as convex optimization. We prove that training shallow neural networks for score prediction can be done by solving a single convex program. Although most analyses of diffusion models operate in the asymptotic setting or rely on approximations, we characterize the exact predicted score function and establish convergence results for neural network-based diffusion models with finite data. Our results provide a precise characterization of what neural network-based diffusion models learn in non-asymptotic settings.
