Compact Circulant Layers with Spectral Priors

Joseph Margaryan; Thomas Hamelryck

Compact Circulant Layers with Spectral Priors

Joseph Margaryan, Thomas Hamelryck

TL;DR

Spectral circulant/BCCB layers are effective compact building blocks in both Bayesian and point estimate regimes: compact Bayesian neural networks on MNIST->Fashion-MNIST, variational heads on frozen CIFAR-10 features, and deterministic ViT projections on CIFAR-10/Tiny ImageNet; spectral layers match strong baselines while using substantially fewer parameters and with tighter Lipschitz certificates.

Abstract

Critical applications in areas such as medicine, robotics and autonomous systems require compact (i.e., memory efficient), uncertainty-aware neural networks suitable for edge and other resource-constrained deployments. We study compact spectral circulant and block-circulant-with-circulant-blocks (BCCB) layers: FFT-diagonalizable circular convolutions whose weights live directly in the real FFT (RFFT) half (1D) or half-plane (2D). Parameterizing filters in the frequency domain lets us impose simple spectral structure, perform structured variational inference in a low-dimensional weight space, and calculate exact layer spectral norms, enabling inexpensive global Lipschitz bounds and margin-based robustness diagnostics. By placing independent complex Gaussians on the Hermitian support we obtain a discrete instance of the spectral representation of stationary kernels, inducing an exact stationary Gaussian-process prior over filters on the discrete circle/torus. We exploit this to define a practical spectral prior and a Hermitian-aware low-rank-plus-diagonal variational posterior in real coordinates. Empirically, spectral circulant/BCCB layers are effective compact building blocks in both (variational) Bayesian and point estimate regimes: compact Bayesian neural networks on MNIST->Fashion-MNIST, variational heads on frozen CIFAR-10 features, and deterministic ViT projections on CIFAR-10/Tiny ImageNet; spectral layers match strong baselines while using substantially fewer parameters and with tighter Lipschitz certificates.

Compact Circulant Layers with Spectral Priors

TL;DR

Abstract

Paper Structure (74 sections, 5 theorems, 80 equations, 8 figures, 10 tables, 2 algorithms)

This paper contains 74 sections, 5 theorems, 80 equations, 8 figures, 10 tables, 2 algorithms.

Introduction
Contributions
Related Work
Structured linear operators and circulant layers.
Spectral kernels and Gaussian processes.
Bayesian neural networks and structured variational inference.
Background: Circulant/BCCB Structure and the FFT
Spectral Circulant and BCCB Layers
1D spectral-circulant layers
2D spectral BCCB layers with channel mixing
Band-limiting
Backpropagation and complexity
Spectral Prior and Hermitian-aware SVI
Implementation notes
Software.
...and 59 more sections

Key Result

Theorem 1

Under the construction above, $\mathbf{w}=(w_0,\ldots,w_{n-1})^\top$ is jointly Gaussian, mean-zero, with stationary covariance which depends only on the difference $\tau$ modulo $n$.

Figures (8)

Figure 1: Predictive-entropy KDEs for MNIST (ID; blue) versus Fashion-MNIST (OOD; orange), computed from the SVI posterior predictive. Left: Spectral BCCB (ours). Right: Conv2D baseline. In each image, the left panel shows the full entropy range and the right panel shows the same zoom window $H\in[0,0.12]$ nats.
Figure 2: Train and validation cross-entropy over epochs for dense vs spectral ViT on Tiny ImageNet. Vertical lines mark best-validation checkpoints.
Figure 3: Training dynamics with log-scaled $y$-axis. Negative ELBO / loss for MNIST (1D and 2D models). SpectralBCCB stabilizes earlier in all settings.
Figure 4: Predictive-entropy KDEs on CIFAR-10 (ID, blue) vs CIFAR-10-C (OOD, severity 5, orange) for frozen-feature heads.
Figure 5: Training dynamics of the global Lipschitz upper bound $\widehat{\operatorname{Lip}}(f_{\bm{\theta}})=\prod_{\ell=1}^{L}\|\mathbf{W}_\ell(\bm{\theta})\|_2$ for Bayesian heads on frozen CIFAR-10 features. The SpectralCirculant1d head stays in a substantially lower Lipschitz regime throughout SVI compared to the Dense head.
...and 3 more figures

Theorems & Definitions (9)

Theorem 1: Discrete spectral GP prior
Proposition 2: Prior high-probability bound for spectral-circulant norms
Definition 3: Spectral–circulant sampling
Remark 4: Second–order structure
Theorem 5: GP–equivalence on $\mathbb Z_n$
proof
Corollary 6: Discrete Bochner/Herglotz
proof : Proof of Proposition \ref{['prop:prior-lip']}
Corollary 7: Prior typical-case control of the product Lipschitz bound

Compact Circulant Layers with Spectral Priors

TL;DR

Abstract

Compact Circulant Layers with Spectral Priors

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (9)