Table of Contents
Fetching ...

Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization

Fangzhao Zhang, Mert Pilanci

TL;DR

The work addresses the problem of understanding neural network-based diffusion models by deriving convex reformulations of score matching and denoising score matching for two-layer networks. It shows that training these networks to predict the score function can be performed via convex programs, yielding exact finite-sample predictions and convergence guarantees for Langevin sampling under appropriate conditions. Key contributions include explicit univariate convex formulations with a piecewise-linear score, a dual characterization for multivariate data using arrangement matrices, and practical insights into when the learned score corresponds to Gaussian or Gaussian-mixture distributions. The results provide a non-asymptotic, theoretically grounded view of what two-layer neural score predictors learn and offer potential numerical benefits through convex optimization for diffusion-model training.

Abstract

Diffusion models are gaining widespread use in cutting-edge image, video, and audio generation. Score-based diffusion models stand out among these methods, necessitating the estimation of score function of the input data distribution. In this study, we present a theoretical framework to analyze two-layer neural network-based diffusion models by reframing score matching and denoising score matching as convex optimization. We prove that training shallow neural networks for score prediction can be done by solving a single convex program. Although most analyses of diffusion models operate in the asymptotic setting or rely on approximations, we characterize the exact predicted score function and establish convergence results for neural network-based diffusion models with finite data. Our results provide a precise characterization of what neural network-based diffusion models learn in non-asymptotic settings.

Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization

TL;DR

The work addresses the problem of understanding neural network-based diffusion models by deriving convex reformulations of score matching and denoising score matching for two-layer networks. It shows that training these networks to predict the score function can be performed via convex programs, yielding exact finite-sample predictions and convergence guarantees for Langevin sampling under appropriate conditions. Key contributions include explicit univariate convex formulations with a piecewise-linear score, a dual characterization for multivariate data using arrangement matrices, and practical insights into when the learned score corresponds to Gaussian or Gaussian-mixture distributions. The results provide a non-asymptotic, theoretically grounded view of what two-layer neural score predictors learn and offer potential numerical benefits through convex optimization for diffusion-model training.

Abstract

Diffusion models are gaining widespread use in cutting-edge image, video, and audio generation. Score-based diffusion models stand out among these methods, necessitating the estimation of score function of the input data distribution. In this study, we present a theoretical framework to analyze two-layer neural network-based diffusion models by reframing score matching and denoising score matching as convex optimization. We prove that training shallow neural networks for score prediction can be done by solving a single convex program. Although most analyses of diffusion models operate in the asymptotic setting or rely on approximations, we characterize the exact predicted score function and establish convergence results for neural network-based diffusion models with finite data. Our results provide a precise characterization of what neural network-based diffusion models learn in non-asymptotic settings.
Paper Structure (43 sections, 11 theorems, 123 equations, 5 figures, 2 algorithms)

This paper contains 43 sections, 11 theorems, 123 equations, 5 figures, 2 algorithms.

Key Result

Theorem 3.1

When $\sigma$ is ReLU or absolute value activation and $V=0$, denote the optimal score matching objective value (train_obj) with $s_\theta$ specified in (general_arc) as $p^*,$ when $m\geq \text{len}(y)$ and $\beta\geq 1Note when $\beta<1$, the optimal value to problem (train_obj) may be unbounded, where the entries of $A$ are determined by the pairwise distances between data points, and the entr

Figures (5)

  • Figure 1: Predicted score function and its integration for univariate data with two-layer neural network with ReLU activation (left) and absolut value activation (right). The left subplot shows all optimal score predictions by convex score predictor for univariate input data of arbitrary distribution for certain weight decay range and the right subplot shows its integration. See Section \ref{['convergence']} for details.
  • Figure 2: Simulation results for score matching tasks with two-layer ReLU neural network. Left figure is for Gaussian data, right figure is for two-component Gaussian mixture. Sampling histogram is with Langevin dynamics. See Section \ref{['score_simu_sec']} for details.
  • Figure 3: 2D simulation results for denoising score matching tasks with our convex score predictor. The second figure shows vector field plot for score predicted by our convex score predictor. The right plots show denoising procedure with different noise levels in annealed Langevin sampling. See Section \ref{['dsm_simu_sec']} for details.
  • Figure 4: Simulation results for score matching tasks with two-layer neural network. The left subplots for all four categories show training loss where the dashed blue lines indicate loss of convex score predictor. The middle plots show score prediction by convex score predictor. The right plots show sampling histograms via plain Langevin process with convex score predictor. See Appendix \ref{['score_simu_supp']} for details.
  • Figure 5: Simulation results for denoising score matching tasks with two-layer ReLU neural network. The left plot shows training loss where the dashed blue line indicates loss of convex score predictor (\ref{['thm4formula']}). The second plot shows sampling histogram via annealed Langevin process with convex score predictor. The third, fourth, and fifth plots show sampling histograms via annealed Langevin process with non-convex score predictors trained with learning rates $1,1e-2,1e-6$ respectively. The ground truth distribution is standard Gaussian, which is recovered by our model.

Theorems & Definitions (30)

  • Theorem 3.1
  • proof
  • Theorem 3.2
  • proof
  • Theorem 3.3
  • proof
  • Theorem 4.1
  • proof
  • Theorem 4.2
  • proof
  • ...and 20 more