Table of Contents
Fetching ...

Spectral Analysis of Diffusion Models with Application to Schedule Design

Roi Benita, Michael Elad, Joseph Keshet

TL;DR

This work provides a theoretical spectral lens on diffusion-model inference by assuming Gaussian target distributions, deriving a closed-form spectral transfer function that maps input noise to output signals. Using this framework, the authors formulate and solve a data-aware noise-schedule optimization under Wasserstein-2 or KL divergence, applicable to DDIM and DDPM under VP/VE. They demonstrate that optimal spectral schedules align with dataset spectral content and often resemble known heuristics, while offering improved performance with few diffusion steps on synthetic and real datasets (e.g., CIFAR-10, MUSIC, SC09). The approach clarifies the link between the data spectrum and diffusion dynamics, enabling principled schedule design and potential speedups in synthesis without retraining denoisers. Overall, the paper provides a principled, frequency-domain method to tailor diffusion schedules to data characteristics, with practical benefits for sample quality and efficiency.

Abstract

Diffusion models (DMs) have emerged as powerful tools for modeling complex data distributions and generating realistic new samples. Over the years, advanced architectures and sampling methods have been developed to make these models practically usable. However, certain synthesis process decisions still rely on heuristics without a solid theoretical foundation. In our work, we offer a novel analysis of the DM's inference process, introducing a comprehensive frequency response perspective. Specifically, by relying on Gaussianity assumption, we present the inference process as a closed-form spectral transfer function, capturing how the generated signal evolves in response to the initial noise. We demonstrate how the proposed analysis can be leveraged to design a noise schedule that aligns effectively with the characteristics of the data. The spectral perspective also provides insights into the underlying dynamics and sheds light on the relationship between spectral properties and noise schedule structure. Our results lead to scheduling curves that are dependent on the spectral content of the data, offering a theoretical justification for some of the heuristics taken by practitioners.

Spectral Analysis of Diffusion Models with Application to Schedule Design

TL;DR

This work provides a theoretical spectral lens on diffusion-model inference by assuming Gaussian target distributions, deriving a closed-form spectral transfer function that maps input noise to output signals. Using this framework, the authors formulate and solve a data-aware noise-schedule optimization under Wasserstein-2 or KL divergence, applicable to DDIM and DDPM under VP/VE. They demonstrate that optimal spectral schedules align with dataset spectral content and often resemble known heuristics, while offering improved performance with few diffusion steps on synthetic and real datasets (e.g., CIFAR-10, MUSIC, SC09). The approach clarifies the link between the data spectrum and diffusion dynamics, enabling principled schedule design and potential speedups in synthesis without retraining denoisers. Overall, the paper provides a principled, frequency-domain method to tailor diffusion schedules to data characteristics, with practical benefits for sample quality and efficiency.

Abstract

Diffusion models (DMs) have emerged as powerful tools for modeling complex data distributions and generating realistic new samples. Over the years, advanced architectures and sampling methods have been developed to make these models practically usable. However, certain synthesis process decisions still rely on heuristics without a solid theoretical foundation. In our work, we offer a novel analysis of the DM's inference process, introducing a comprehensive frequency response perspective. Specifically, by relying on Gaussianity assumption, we present the inference process as a closed-form spectral transfer function, capturing how the generated signal evolves in response to the initial noise. We demonstrate how the proposed analysis can be leveraged to design a noise schedule that aligns effectively with the characteristics of the data. The spectral perspective also provides insights into the underlying dynamics and sheds light on the relationship between spectral properties and noise schedule structure. Our results lead to scheduling curves that are dependent on the spectral content of the data, offering a theoretical justification for some of the heuristics taken by practitioners.

Paper Structure

This paper contains 47 sections, 9 theorems, 164 equations, 30 figures, 4 tables.

Key Result

Theorem 3.1

Let $\mathbf{x}_0 \sim \mathcal{N}(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0)$ and let $\mathbf{x}_t$ be defined by eq:marginal_dist. Then, the denoised signal obtained from the MMSE (and the MAP) denoiser is given by:

Figures (30)

  • Figure 1: Figure \ref{['fig:Exp_1_Spectral_reccomandation_wasserstein']} presents the optimized spectral schedules for $\boldsymbol{\Sigma}_0 = A^TA$ with $d = 50$, $l = 0.1$, and $\boldsymbol{\mu_0} = 0.05 \cdot \mathbf{1}_d$, obtained by minimizing the Wasserstein-2 distance for various numbers of diffusion steps $S \in[10, 28, 38, 60, 90, 112, 250, 334]$. Figure \ref{['subfig:Spectral_scheduler_comparison']} compares the spectral schedule (dotted red) for $S=112$ with various heuristic alternatives, including linear, EDM $(\rho = 7)$ , Cosine-based schedules such as Cosine ($s=0$,$e=1$,$\tau=1$) as in nichol2021improvedchen2023importance and Sigmoid-based schedules like Sigmoid ($s=-3$,$e=3$,$\tau=1$) from jabri2022scalablechen2023importance. Parametric estimations for the Cosine and Sigmoid schedules appear in brown and cyan, respectively. Figure \ref{['fig:Exp_1_loss_comparison_wasserstein']} compares the Wasserstein-2 distance of the spectral recommendation with that of the baseline schedules across different step counts.
  • Figure 2: Figure \ref{['subfig:Each_eigenvalue_at_a_time_paper']} shows the spectral schedules obtained by solving the optimization problem individually for each eigenvalue, with other contributions set to zero (note that the reverse process proceeds from right to left). Figure \ref{['fig:Eagenvalues_Relative_error_cosine']} illustrates the relative error of the 10 largest eigenvalues over the final 20 steps of a 60-step diffusion process using the Cosine $(0,1,1)$ schedule.
  • Figure 3: Comparison of the spectral schedules for CIFAR-10 and MUSIC Datasets with various heuristic noise schedules, using $112$ diffusion steps.
  • Figure 4: Figures \ref{['subfig:Exp_3_Wasserstein_Cifar']} and \ref{['subfig:Exp_3_wasserstein_2_distance_']} show the Wasserstein-2 distance of the proposed spectral noise schedule (in red) compared to existing heuristics, evaluated across various diffusion step sizes, for CIFAR-10 and MUSIC. Figure \ref{['subfig:Exp_3_FID_Cifar']} presents the corresponding FID scores for CIFAR-10. Across all comparisons, the spectral schedule generally outperforms the heuristics, with a wider gap at lower step counts. For CIFAR-10, the approximation error is less pronounced, with results showing greater stability.
  • Figure 5: Figure \ref{['fig:Sanity_values']} compares the eigenvalues derived from the spectral and time domain formulations of the DDIM method ho2020denoising. The dataset, described in \ref{['subsec:Scenario_1']} with $l = 0.1$ and $\text{d} = 50$, is used for both approaches, involving $112$ diffusion steps and following the linear noise schedule proposed in song2020denoising. Furthermore, \ref{['fig:Sanity_relative_Diff']} illustrates the absolute error between the estimated and original eigenvalues for both methods.
  • ...and 25 more figures

Theorems & Definitions (9)

  • Theorem 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Theorem 3.5
  • Theorem 3.6
  • Theorem 4.1
  • Theorem 4.2
  • Lemma E.1