Table of Contents
Fetching ...

Spectral Regularization for Diffusion Models

Satish Chandran, Nicolas Roque dos Santos, Yunshu Wu, Greg Ver Steeg, Evangelos Papalexakis

TL;DR

A loss-level spectral regularization framework that augments standard diffusion training with differentiable Fourier- and wavelet-domain losses, without modifying the diffusion process, model architecture, or sampling procedure is proposed.

Abstract

Diffusion models are typically trained using pointwise reconstruction objectives that are agnostic to the spectral and multi-scale structure of natural signals. We propose a loss-level spectral regularization framework that augments standard diffusion training with differentiable Fourier- and wavelet-domain losses, without modifying the diffusion process, model architecture, or sampling procedure. The proposed regularizers act as soft inductive biases that encourage appropriate frequency balance and coherent multi-scale structure in generated samples. Our approach is compatible with DDPM, DDIM, and EDM formulations and introduces negligible computational overhead. Experiments on image and audio generation demonstrate consistent improvements in sample quality, with the largest gains observed on higher-resolution, unconditional datasets where fine-scale structure is most challenging to model.

Spectral Regularization for Diffusion Models

TL;DR

A loss-level spectral regularization framework that augments standard diffusion training with differentiable Fourier- and wavelet-domain losses, without modifying the diffusion process, model architecture, or sampling procedure is proposed.

Abstract

Diffusion models are typically trained using pointwise reconstruction objectives that are agnostic to the spectral and multi-scale structure of natural signals. We propose a loss-level spectral regularization framework that augments standard diffusion training with differentiable Fourier- and wavelet-domain losses, without modifying the diffusion process, model architecture, or sampling procedure. The proposed regularizers act as soft inductive biases that encourage appropriate frequency balance and coherent multi-scale structure in generated samples. Our approach is compatible with DDPM, DDIM, and EDM formulations and introduces negligible computational overhead. Experiments on image and audio generation demonstrate consistent improvements in sample quality, with the largest gains observed on higher-resolution, unconditional datasets where fine-scale structure is most challenging to model.
Paper Structure (43 sections, 24 equations, 10 figures, 3 tables)

This paper contains 43 sections, 24 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Checkerboard toy experiment. Figures (a) to (c) show the ground-truth pattern, a sample from a model trained without spectral regularization, and a sample from a model trained with the proposed amplitude-and-phase loss.
  • Figure 2: Radially averaged power spectra (log scale) for the ground truth, baseline DDPM with MSE loss, and DDPM with our amplitude+phase spectral loss.
  • Figure 3: Generated AFHQ samples obtained by fine-tuning with the unweighted Fourier amplitude loss under different EDM formulations.
  • Figure 4: Generated AFHQ samples obtained by fine-tuning with the unweighted Fourier amplitude+phase loss under different EDM formulations.
  • Figure 5: Generated AFHQ samples obtained by fine-tuning with the unweighted Haar wavelet loss under different EDM formulations.
  • ...and 5 more figures