Table of Contents
Fetching ...

An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

Binxu Wang, Cengiz Pehlevan

Abstract

We develop an analytical framework for understanding how the generated distribution evolves during diffusion model training. Leveraging a Gaussian-equivalence principle, we solve the full-batch gradient-flow dynamics of linear and convolutional denoisers and integrate the resulting probability-flow ODE, yielding analytic expressions for the generated distribution. The theory exposes a universal inverse-variance spectral law: the time for an eigen- or Fourier mode to match its target variance scales as $τ\proptoλ^{-1}$, so high-variance (coarse) structure is mastered orders of magnitude sooner than low-variance (fine) detail. Extending the analysis to deep linear networks and circulant full-width convolutions shows that weight sharing merely multiplies learning rates -- accelerating but not eliminating the bias -- whereas local convolution introduces a qualitatively different bias. Experiments on Gaussian and natural-image datasets confirm the spectral law persists in deep MLP-based UNet. Convolutional U-Nets, however, display rapid near-simultaneous emergence of many modes, implicating local convolution in reshaping learning dynamics. These results underscore how data covariance governs the order and speed with which diffusion models learn, and they call for deeper investigation of the unique inductive biases introduced by local convolution.

An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

Abstract

We develop an analytical framework for understanding how the generated distribution evolves during diffusion model training. Leveraging a Gaussian-equivalence principle, we solve the full-batch gradient-flow dynamics of linear and convolutional denoisers and integrate the resulting probability-flow ODE, yielding analytic expressions for the generated distribution. The theory exposes a universal inverse-variance spectral law: the time for an eigen- or Fourier mode to match its target variance scales as , so high-variance (coarse) structure is mastered orders of magnitude sooner than low-variance (fine) detail. Extending the analysis to deep linear networks and circulant full-width convolutions shows that weight sharing merely multiplies learning rates -- accelerating but not eliminating the bias -- whereas local convolution introduces a qualitatively different bias. Experiments on Gaussian and natural-image datasets confirm the spectral law persists in deep MLP-based UNet. Convolutional U-Nets, however, display rapid near-simultaneous emergence of many modes, implicating local convolution in reshaping learning dynamics. These results underscore how data covariance governs the order and speed with which diffusion models learn, and they call for deeper investigation of the unique inductive biases introduced by local convolution.

Paper Structure

This paper contains 179 sections, 13 theorems, 379 equations, 29 figures, 3 tables.

Key Result

Lemma 4.1

If the linear denoiser $\mathbf D(\mathbf{x};\sigma)=\mathbf{W}_{\sigma}\mathbf{x}+\mathbf b_{\sigma}$ satisfies $[\mathbf{W}_{\sigma},\mathbf{W}_{\sigma'}]=0$ for all $\sigma,\sigma'$, then for any $0<\sigma_{0}<\sigma_{T}$, $\blacktriangleleft$$\blacktriangleleft$

Figures (29)

  • Figure 1: Spectral‑bias schematic. Learning and sampling together impose a variance‑ordered bias along covariance eigenmodes.
  • Figure 2: Learning dynamics per eigenmode.Top: one‑layer linear denoiser. Bottom: two‑layer symmetric denoiser. (A,D) Weight trajectories $\mathbf{u}_k^{\!\top}\mathbf{W}_\sigma(\tau)\mathbf{u}_k\ (\sigma\!=\!1)$. (B,E) Generated‑variance $\tilde{\lambda}_k$ versus target variance $\lambda_k$. (C,F) Power‑law relation between emergence time $\tau_k^{*}$ and $\lambda_k$.
  • Figure 3: Spectral Learning Dynamics of MLP-UNet (FFHQ32).A. Generated samples during training. B. Evolution of sample variance $\tilde{\lambda}_k(\tau)$ across eigenmodes during training. C. Heatmap of variance trajectories along all eigenmodes, with dots marking mode emergence times $\tau^*$ (first‐passage time at the geometric mean of initial and final variances). The gray zone (0.5–2× target variance) indicates modes starting too close to their target, causing unreliable $\tau^*$ estimates. D. Power‐law scaling of $\tau^*$ versus target variance $\lambda_k$. A separate law was fit for modes with increasing and decreasing variance, excluding the middle gray‐zone eigenmodes for stability.
  • Figure 4: Learning dynamics of UNet differs | FFHQ32.A. Sample trajectory from CNN-UNet. B. Variance evolution along covariance eigenmodes. (c.f. Fig. \ref{['fig:MLP_natimg_learning_validation']}A.C.)
  • Figure 5: Learning dynamics of the weight and variance of the generated distribution per eigenmode (continued)Top Single layer linear denoiser. Bottom Symmetric two-layer denoiser. A.C. Learning dynamics of $\mathbf{u}_k^\intercal\mathbf{W}(\tau)\mathbf{u}_k$. B.D. Learning dynamics of the variance of the generated distribution $\tilde{\lambda}_k$, as a function of the variance of the target eigenmode $\lambda_k$. This case with larger amplitude weight initialization $Q_k=0.5$.
  • ...and 24 more figures

Theorems & Definitions (23)

  • Lemma 4.1: PF‑ODE solution for commuting weights
  • Proposition 4.2: Dynamics of generated distribution in one layer case
  • Proposition 5.1: Dynamics of weight and distribution in two layer linear model
  • Proposition 5.2
  • Proposition 5.3: Full‑width circular convolution learning dynamics
  • Proposition 5.4: Patch-convolution learning dynamics
  • Lemma C.1
  • proof
  • Remark C.2
  • Lemma C.3
  • ...and 13 more