Table of Contents
Fetching ...

Compact Circulant Layers with Spectral Priors

Joseph Margaryan, Thomas Hamelryck

TL;DR

Spectral circulant/BCCB layers are effective compact building blocks in both Bayesian and point estimate regimes: compact Bayesian neural networks on MNIST->Fashion-MNIST, variational heads on frozen CIFAR-10 features, and deterministic ViT projections on CIFAR-10/Tiny ImageNet; spectral layers match strong baselines while using substantially fewer parameters and with tighter Lipschitz certificates.

Abstract

Critical applications in areas such as medicine, robotics and autonomous systems require compact (i.e., memory efficient), uncertainty-aware neural networks suitable for edge and other resource-constrained deployments. We study compact spectral circulant and block-circulant-with-circulant-blocks (BCCB) layers: FFT-diagonalizable circular convolutions whose weights live directly in the real FFT (RFFT) half (1D) or half-plane (2D). Parameterizing filters in the frequency domain lets us impose simple spectral structure, perform structured variational inference in a low-dimensional weight space, and calculate exact layer spectral norms, enabling inexpensive global Lipschitz bounds and margin-based robustness diagnostics. By placing independent complex Gaussians on the Hermitian support we obtain a discrete instance of the spectral representation of stationary kernels, inducing an exact stationary Gaussian-process prior over filters on the discrete circle/torus. We exploit this to define a practical spectral prior and a Hermitian-aware low-rank-plus-diagonal variational posterior in real coordinates. Empirically, spectral circulant/BCCB layers are effective compact building blocks in both (variational) Bayesian and point estimate regimes: compact Bayesian neural networks on MNIST->Fashion-MNIST, variational heads on frozen CIFAR-10 features, and deterministic ViT projections on CIFAR-10/Tiny ImageNet; spectral layers match strong baselines while using substantially fewer parameters and with tighter Lipschitz certificates.

Compact Circulant Layers with Spectral Priors

TL;DR

Spectral circulant/BCCB layers are effective compact building blocks in both Bayesian and point estimate regimes: compact Bayesian neural networks on MNIST->Fashion-MNIST, variational heads on frozen CIFAR-10 features, and deterministic ViT projections on CIFAR-10/Tiny ImageNet; spectral layers match strong baselines while using substantially fewer parameters and with tighter Lipschitz certificates.

Abstract

Critical applications in areas such as medicine, robotics and autonomous systems require compact (i.e., memory efficient), uncertainty-aware neural networks suitable for edge and other resource-constrained deployments. We study compact spectral circulant and block-circulant-with-circulant-blocks (BCCB) layers: FFT-diagonalizable circular convolutions whose weights live directly in the real FFT (RFFT) half (1D) or half-plane (2D). Parameterizing filters in the frequency domain lets us impose simple spectral structure, perform structured variational inference in a low-dimensional weight space, and calculate exact layer spectral norms, enabling inexpensive global Lipschitz bounds and margin-based robustness diagnostics. By placing independent complex Gaussians on the Hermitian support we obtain a discrete instance of the spectral representation of stationary kernels, inducing an exact stationary Gaussian-process prior over filters on the discrete circle/torus. We exploit this to define a practical spectral prior and a Hermitian-aware low-rank-plus-diagonal variational posterior in real coordinates. Empirically, spectral circulant/BCCB layers are effective compact building blocks in both (variational) Bayesian and point estimate regimes: compact Bayesian neural networks on MNIST->Fashion-MNIST, variational heads on frozen CIFAR-10 features, and deterministic ViT projections on CIFAR-10/Tiny ImageNet; spectral layers match strong baselines while using substantially fewer parameters and with tighter Lipschitz certificates.
Paper Structure (74 sections, 5 theorems, 80 equations, 8 figures, 10 tables, 2 algorithms)

This paper contains 74 sections, 5 theorems, 80 equations, 8 figures, 10 tables, 2 algorithms.

Key Result

Theorem 1

Under the construction above, $\mathbf{w}=(w_0,\ldots,w_{n-1})^\top$ is jointly Gaussian, mean-zero, with stationary covariance which depends only on the difference $\tau$ modulo $n$.

Figures (8)

  • Figure 1: Predictive-entropy KDEs for MNIST (ID; blue) versus Fashion-MNIST (OOD; orange), computed from the SVI posterior predictive. Left: Spectral BCCB (ours). Right: Conv2D baseline. In each image, the left panel shows the full entropy range and the right panel shows the same zoom window $H\in[0,0.12]$ nats.
  • Figure 2: Train and validation cross-entropy over epochs for dense vs spectral ViT on Tiny ImageNet. Vertical lines mark best-validation checkpoints.
  • Figure 3: Training dynamics with log-scaled $y$-axis. Negative ELBO / loss for MNIST (1D and 2D models). SpectralBCCB stabilizes earlier in all settings.
  • Figure 4: Predictive-entropy KDEs on CIFAR-10 (ID, blue) vs CIFAR-10-C (OOD, severity 5, orange) for frozen-feature heads.
  • Figure 5: Training dynamics of the global Lipschitz upper bound $\widehat{\operatorname{Lip}}(f_{\bm{\theta}})=\prod_{\ell=1}^{L}\|\mathbf{W}_\ell(\bm{\theta})\|_2$ for Bayesian heads on frozen CIFAR-10 features. The SpectralCirculant1d head stays in a substantially lower Lipschitz regime throughout SVI compared to the Dense head.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Theorem 1: Discrete spectral GP prior
  • Proposition 2: Prior high-probability bound for spectral-circulant norms
  • Definition 3: Spectral–circulant sampling
  • Remark 4: Second–order structure
  • Theorem 5: GP–equivalence on $\mathbb Z_n$
  • proof
  • Corollary 6: Discrete Bochner/Herglotz
  • proof : Proof of Proposition \ref{['prop:prior-lip']}
  • Corollary 7: Prior typical-case control of the product Lipschitz bound