Table of Contents
Fetching ...

InSPECT: Invariant Spectral Features Preservation of Diffusion Models

Baohua Yan, Qingyuan Liu, Jennifer Kava, Xuan Di

TL;DR

InSPECT addresses the core limitation of diffusion models that fully diffuse data to white noise by preserving invariant spectral features in the Fourier domain. It introduces a spectral-space forward process that diffuses toward a class-conditioned Gaussian and a backward process that leverages pixel-space denoisers after inverse Fourier transforms, maintaining key Fourier statistics throughout. The approach yields faster convergence, improved generation quality and diversity, and effective class-conditioned generation, as demonstrated on CIFAR-10, CelebA, and LSUN with significant FID/IS gains over DDPM. Theoretical analyses accompany the design, including backward-posterior proofs and mean-value convergence considerations, underscoring the method’s soundness and potential applicability to other data modalities with spectral representations.

Abstract

Modern diffusion models (DMs) have achieved state-of-the-art image generation. However, the fundamental design choice of diffusing data all the way to white noise and then reconstructing it leads to an extremely difficult and computationally intractable prediction task. To overcome this limitation, we propose InSPECT (Invariant Spectral Feature-Preserving Diffusion Model), a novel diffusion model that keeps invariant spectral features during both the forward and backward processes. At the end of the forward process, the Fourier coefficients smoothly converge to a specified random noise, enabling features preservation while maintaining diversity and randomness. By preserving invariant features, InSPECT demonstrates enhanced visual diversity, faster convergence rate, and a smoother diffusion process. Experiments on CIFAR-10, Celeb-A, and LSUN demonstrate that InSPECT achieves on average a 39.23% reduction in FID and 45.80% improvement in IS against DDPM for 10K iterations under specified parameter settings, which demonstrates the significant advantages of preserving invariant features: achieving superior generation quality and diversity, while enhancing computational efficiency and enabling faster convergence rate. To the best of our knowledge, this is the first attempt to analyze and preserve invariant spectral features in diffusion models.

InSPECT: Invariant Spectral Features Preservation of Diffusion Models

TL;DR

InSPECT addresses the core limitation of diffusion models that fully diffuse data to white noise by preserving invariant spectral features in the Fourier domain. It introduces a spectral-space forward process that diffuses toward a class-conditioned Gaussian and a backward process that leverages pixel-space denoisers after inverse Fourier transforms, maintaining key Fourier statistics throughout. The approach yields faster convergence, improved generation quality and diversity, and effective class-conditioned generation, as demonstrated on CIFAR-10, CelebA, and LSUN with significant FID/IS gains over DDPM. Theoretical analyses accompany the design, including backward-posterior proofs and mean-value convergence considerations, underscoring the method’s soundness and potential applicability to other data modalities with spectral representations.

Abstract

Modern diffusion models (DMs) have achieved state-of-the-art image generation. However, the fundamental design choice of diffusing data all the way to white noise and then reconstructing it leads to an extremely difficult and computationally intractable prediction task. To overcome this limitation, we propose InSPECT (Invariant Spectral Feature-Preserving Diffusion Model), a novel diffusion model that keeps invariant spectral features during both the forward and backward processes. At the end of the forward process, the Fourier coefficients smoothly converge to a specified random noise, enabling features preservation while maintaining diversity and randomness. By preserving invariant features, InSPECT demonstrates enhanced visual diversity, faster convergence rate, and a smoother diffusion process. Experiments on CIFAR-10, Celeb-A, and LSUN demonstrate that InSPECT achieves on average a 39.23% reduction in FID and 45.80% improvement in IS against DDPM for 10K iterations under specified parameter settings, which demonstrates the significant advantages of preserving invariant features: achieving superior generation quality and diversity, while enhancing computational efficiency and enabling faster convergence rate. To the best of our knowledge, this is the first attempt to analyze and preserve invariant spectral features in diffusion models.

Paper Structure

This paper contains 22 sections, 38 equations, 9 figures, 4 tables, 2 algorithms.

Figures (9)

  • Figure 1: Visualization of InSPECT forward and backward processes. Input data $\mathbf{\hat{x}_0}$ is not completely destroyed; invariant component features are retained within the random noise at $\mathbf{\hat{x}_T}$
  • Figure 2: Heatmap illustration of Fourier coefficients mean $\boldsymbol{\mu}$ and standard deviation $\mathbf{\Sigma}^{1/2}$ in the spectral space of MNIST dataset corresponding to digits 0-2. Cyan points indicate invariant components, i.e. components with $\mathbf{\Sigma}^{1/2}$ = 0.
  • Figure 3: Visualization of spectral random noise $\mathcal{N}(\boldsymbol{\mu}, \mathbf{\Sigma})$ corresponding to digits 0–2 in the MNIST dataset. We generate the random samples in spectral space and map them back to pixel space. Label-related features are preserved in the spectral space.
  • Figure 4: The graphical model of Invariant Spectral Feature Preservation in Diffusion Models (InSPECT) considered in this work. We convert a given image dataset, $\mathbf{x}_0$ into a Fourier coefficients and carry out the InSPECT forward and backward process, and convert the Fourier coefficients back into $\mathbf{x}_0$.
  • Figure 5: Mean and standard deviation of the CIFAR-10 dataset. $x$-axis is the frequency radius of the spectral coefficients. We pick red channel and show the curves related to one label.
  • ...and 4 more figures