Spectral Dictionary Learning for Generative Image Modeling

Andrew Kiruluta

Spectral Dictionary Learning for Generative Image Modeling

Andrew Kiruluta

TL;DR

This work addresses the limitations of stochastic latent models by introducing a spectral dictionary learning approach for image synthesis. Images are represented as $\hat{\mathbf{x}} = \sum_{i=1}^K w_i s_i(t)$, where each spectral atom $s_i(t)$ is parameterized by time‑varying amplitude, frequency, and phase, and modulated by a small network to capture local spectral dynamics. The dictionary is learned jointly with per‑image mixing coefficients, and a simple probabilistic prior over $\mathbf{w}$ enables deterministic generation via a single linear synthesis step; a STFT‑based loss enforces both global structure and detailed spectral content. The model yields interpretable spectral components, stable training, and efficient sampling, achieving competitive CIFAR‑10 metrics (e.g., $\text{FID}=55.4$, $\text{IS}=7.2$) while offering a controllable alternative to GANs and diffusion models. This approach opens avenues for interpretable, spectrally‑driven image manipulation and analysis, with potential extensions to higher resolutions and richer priors.

Abstract

We propose a novel spectral generative model for image synthesis that departs radically from the common variational, adversarial, and diffusion paradigms. In our approach, images, after being flattened into one-dimensional signals, are reconstructed as linear combinations of a set of learned spectral basis functions, where each basis is explicitly parameterized in terms of frequency, phase, and amplitude. The model jointly learns a global spectral dictionary with time-varying modulations and per-image mixing coefficients that quantify the contributions of each spectral component. Subsequently, a simple probabilistic model is fitted to these mixing coefficients, enabling the deterministic generation of new images by sampling from the latent space. This framework leverages deterministic dictionary learning, offering a highly interpretable and physically meaningful representation compared to methods relying on stochastic inference or adversarial training. Moreover, the incorporation of frequency-domain loss functions, computed via the short-time Fourier transform (STFT), ensures that the synthesized images capture both global structure and fine-grained spectral details, such as texture and edge information. Experimental evaluations on the CIFAR-10 benchmark demonstrate that our approach not only achieves competitive performance in terms of reconstruction quality and perceptual fidelity but also offers improved training stability and computational efficiency. This new type of generative model opens up promising avenues for controlled synthesis, as the learned spectral dictionary affords a direct handle on the intrinsic frequency content of the images, thus providing enhanced interpretability and potential for novel applications in image manipulation and analysis.

Spectral Dictionary Learning for Generative Image Modeling

TL;DR

Abstract

Spectral Dictionary Learning for Generative Image Modeling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)