Table of Contents
Fetching ...

The Hidden Linear Structure in Score-Based Models and its Application

Binxu Wang, John J. Vastola

TL;DR

This work claimed that for well-trained diffusion models, the learned score at a high noise scale is well approximated by the linear score of Gaussian, and derived the closed-form solution to the scored-based model with a Gaussian score.

Abstract

Score-based models have achieved remarkable results in the generative modeling of many domains. By learning the gradient of smoothed data distribution, they can iteratively generate samples from complex distribution e.g. natural images. However, is there any universal structure in the gradient field that will eventually be learned by any neural network? Here, we aim to find such structures through a normative analysis of the score function. First, we derived the closed-form solution to the scored-based model with a Gaussian score. We claimed that for well-trained diffusion models, the learned score at a high noise scale is well approximated by the linear score of Gaussian. We demonstrated this through empirical validation of pre-trained images diffusion model and theoretical analysis of the score function. This finding enabled us to precisely predict the initial diffusion trajectory using the analytical solution and to accelerate image sampling by 15-30\% by skipping the initial phase without sacrificing image quality. Our finding of the linear structure in the score-based model has implications for better model design and data pre-processing.

The Hidden Linear Structure in Score-Based Models and its Application

TL;DR

This work claimed that for well-trained diffusion models, the learned score at a high noise scale is well approximated by the linear score of Gaussian, and derived the closed-form solution to the scored-based model with a Gaussian score.

Abstract

Score-based models have achieved remarkable results in the generative modeling of many domains. By learning the gradient of smoothed data distribution, they can iteratively generate samples from complex distribution e.g. natural images. However, is there any universal structure in the gradient field that will eventually be learned by any neural network? Here, we aim to find such structures through a normative analysis of the score function. First, we derived the closed-form solution to the scored-based model with a Gaussian score. We claimed that for well-trained diffusion models, the learned score at a high noise scale is well approximated by the linear score of Gaussian. We demonstrated this through empirical validation of pre-trained images diffusion model and theoretical analysis of the score function. This finding enabled us to precisely predict the initial diffusion trajectory using the analytical solution and to accelerate image sampling by 15-30\% by skipping the initial phase without sacrificing image quality. Our finding of the linear structure in the score-based model has implications for better model design and data pre-processing.
Paper Structure (35 sections, 75 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 35 sections, 75 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Gaussian score function approximates the learned neural scoreA. Conceptual plan of the paper. B. Approximation error of learned neural score by different analytical scores: Gaussian, isotropic, Gaussian mixture, and 'exact' point cloud. Note the log scale of error on y axis. C. Deviation of diffusion trajectory from the solution of analytical scores.
  • Figure 2: Validation and application of the Gaussian score approximation. A. The denoiser images $D(\mathbf{x}_t,t)$ along a diffusion sampling trajectory compared with the Gaussian solution with the same initial condition $\mathbf{x}_T$. B. Samples generated by the EDM model, Gaussian solution, and the 'exact' scores from the same initial condition. C. Sampled image as a function of skipped steps with the hybrid method combining Gaussian theory with Heun's method. D. Image quality (FID score) of the hybrid method as a function of NFE, skipped steps, and skipped noise scale (see Appendix \ref{['apd:fid_method']}).
  • Figure 3: Approximation error of score learned by score neural network with various analytical approximations. Note the log scale on y.
  • Figure 4: Deviation between the state trajectory $\mathbf{x}_t$ of EDM neural network and the analytical score. The thick line denotes the mean over initial conditions; the shaded area denotes 25%, 75% quantile range over the initial conditions.
  • Figure 5: Deviation between the denoiser $D(\mathbf{x}_t,t)$ of EDM neural network and the analytical score. The thick line denotes the mean over initial conditions; the shaded area denotes 25%, 75% quantile range over the initial conditions.
  • ...and 3 more figures