Table of Contents
Fetching ...

Spectral Gating Networks

Jusheng Zhang, Yijia Fan, Kaitong Cai, Jing Yang, Yongsen Zheng, Kwok-Yan Lam, Liang Lin, Keze Wang

TL;DR

Spectral Gating Networks (SGN) is introduced, a drop-in spectral reparameterization that consistently improves accuracy-efficiency trade-offs under comparable computational budgets, achieving 93.15% accuracy on CIFAR-10 and up to 11.7x faster inference than spline-based KAN variants.

Abstract

Gating mechanisms are ubiquitous, yet a complementary question in feed-forward networks remains under-explored: how to introduce frequency-rich expressivity without sacrificing stability and scalability? This tension is exposed by spline-based Kolmogorov-Arnold Network (KAN) parameterizations, where grid refinement can induce parameter growth and brittle optimization in high dimensions. To propose a stability-preserving way to inject spectral capacity into existing MLP/FFN layers under fixed parameter and training budgets, we introduce Spectral Gating Networks (SGN), a drop-in spectral reparameterization. SGN augments a standard activation pathway with a compact spectral pathway and learnable gates that allow the model to start from a stable base behavior and progressively allocate capacity to spectral features during training. The spectral pathway is instantiated with trainable Random Fourier Features (learned frequencies and phases), replacing grid-based splines and removing resolution dependence. A hybrid GELU-Fourier formulation further improves optimization robustness while enhancing high-frequency fidelity. Across vision, NLP, audio, and PDE benchmarks, SGN consistently improves accuracy-efficiency trade-offs under comparable computational budgets, achieving 93.15% accuracy on CIFAR-10 and up to 11.7x faster inference than spline-based KAN variants. Code and trained models will be released.

Spectral Gating Networks

TL;DR

Spectral Gating Networks (SGN) is introduced, a drop-in spectral reparameterization that consistently improves accuracy-efficiency trade-offs under comparable computational budgets, achieving 93.15% accuracy on CIFAR-10 and up to 11.7x faster inference than spline-based KAN variants.

Abstract

Gating mechanisms are ubiquitous, yet a complementary question in feed-forward networks remains under-explored: how to introduce frequency-rich expressivity without sacrificing stability and scalability? This tension is exposed by spline-based Kolmogorov-Arnold Network (KAN) parameterizations, where grid refinement can induce parameter growth and brittle optimization in high dimensions. To propose a stability-preserving way to inject spectral capacity into existing MLP/FFN layers under fixed parameter and training budgets, we introduce Spectral Gating Networks (SGN), a drop-in spectral reparameterization. SGN augments a standard activation pathway with a compact spectral pathway and learnable gates that allow the model to start from a stable base behavior and progressively allocate capacity to spectral features during training. The spectral pathway is instantiated with trainable Random Fourier Features (learned frequencies and phases), replacing grid-based splines and removing resolution dependence. A hybrid GELU-Fourier formulation further improves optimization robustness while enhancing high-frequency fidelity. Across vision, NLP, audio, and PDE benchmarks, SGN consistently improves accuracy-efficiency trade-offs under comparable computational budgets, achieving 93.15% accuracy on CIFAR-10 and up to 11.7x faster inference than spline-based KAN variants. Code and trained models will be released.
Paper Structure (102 sections, 1 theorem, 68 equations, 11 figures, 18 tables)

This paper contains 102 sections, 1 theorem, 68 equations, 11 figures, 18 tables.

Key Result

Proposition 8.1

Let $\gamma(u)$ be the Fourier feature map used in SGN with spectral budget $m$: where $\omega_j$ denotes the $j$-th column of $W_r$ and $b_j$ the corresponding phase. Consider the SGN activation operator with base activation $\phi(\cdot)$, gate $G(\cdot)$, and trainable projection $A_r$. Then, for each output channel (coordinate) $k\in\{1,\ldots,d_{\mathrm{ff}}\}$, the scalar function class ind

Figures (11)

  • Figure 1: Conceptual illustration and theoretical analysis of the Spectral-Efficiency gap.(a) Overcoming Spectral Bias: Illustrating Problem 1. Standard MLPs (grey dashed) suffer from exponential convergence decay as frequency $\omega$ increases, creating a "spectral gap" (hatched area). SGN (green solid) maintains robust learnability across the spectrum. (b) Breaking the Efficiency Bottleneck: Illustrating Problem 2. While spline-based KANs (orange) incur linear cost scaling with grid resolution $G$ (creating an efficiency bottleneck at high $\Omega$), SGN maintains constant complexity, effectively enabling scalable high-frequency modeling.
  • Figure 2: Architecture Comparison: MLP vs. SGN. (Left) A standard MLP layer relies on global weights and fixed activations, primarily capturing low-frequency components. (Right) The proposed SGN architecture introduces a parallel Spectral Branch. Features are split into a stable low-frequency path (Base Activation) and a high-frequency path (Sine/Cosine modulations via RFF). A learnable gate adaptively combines the two branches, achieving expressive detail modeling without sacrificing MLP stability.
  • Figure 3: Performance comparison of different models (KAN, MLP, GPKAN, FAN, SGN) on simple networks across multiple datasets. The results show that SGN typically achieves higher accuracy with fewer parameters.
  • Figure 4: Compare the performance of various models (KAN, GPKAN, MLP, FAN, SGN) across NLP, audio, and ML datasets. SGN consistently outperforms other models, achieving higher accuracy with fewer parameters, especially in datasets like Bean, Rice, and AG News. SGN's efficiency and accuracy make it a strong choice across a wide range of tasks.
  • Figure 5: This experiment compares different models (KAN, GPKAN, MLP, FAN, SGN) on various function approximation tasks, analyzing test RMSE versus the number of parameters. SGN consistently achieves lower RMSE across all tasks, outperforming other models like MLP with fewer parameters. Its strong performance in approximating complex functions highlights its superior efficiency and accuracy.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Proposition 8.1: Spectral Basis Containment (Rigorous Bandwidth Lower Bound)
  • proof
  • Remark 8.2: Why this matches the "bandwidth expansion" narrative