InSPECT: Invariant Spectral Features Preservation of Diffusion Models
Baohua Yan, Qingyuan Liu, Jennifer Kava, Xuan Di
TL;DR
InSPECT addresses the core limitation of diffusion models that fully diffuse data to white noise by preserving invariant spectral features in the Fourier domain. It introduces a spectral-space forward process that diffuses toward a class-conditioned Gaussian and a backward process that leverages pixel-space denoisers after inverse Fourier transforms, maintaining key Fourier statistics throughout. The approach yields faster convergence, improved generation quality and diversity, and effective class-conditioned generation, as demonstrated on CIFAR-10, CelebA, and LSUN with significant FID/IS gains over DDPM. Theoretical analyses accompany the design, including backward-posterior proofs and mean-value convergence considerations, underscoring the method’s soundness and potential applicability to other data modalities with spectral representations.
Abstract
Modern diffusion models (DMs) have achieved state-of-the-art image generation. However, the fundamental design choice of diffusing data all the way to white noise and then reconstructing it leads to an extremely difficult and computationally intractable prediction task. To overcome this limitation, we propose InSPECT (Invariant Spectral Feature-Preserving Diffusion Model), a novel diffusion model that keeps invariant spectral features during both the forward and backward processes. At the end of the forward process, the Fourier coefficients smoothly converge to a specified random noise, enabling features preservation while maintaining diversity and randomness. By preserving invariant features, InSPECT demonstrates enhanced visual diversity, faster convergence rate, and a smoother diffusion process. Experiments on CIFAR-10, Celeb-A, and LSUN demonstrate that InSPECT achieves on average a 39.23% reduction in FID and 45.80% improvement in IS against DDPM for 10K iterations under specified parameter settings, which demonstrates the significant advantages of preserving invariant features: achieving superior generation quality and diversity, while enhancing computational efficiency and enabling faster convergence rate. To the best of our knowledge, this is the first attempt to analyze and preserve invariant spectral features in diffusion models.
