Reducing Deep Network Complexity via Sparse Hierarchical Fourier Interaction Networks
Andrew Kiruluta, Samantha Williams
TL;DR
The paper tackles the high computational cost of long‑range interactions in deep networks by proposing Sparse Hierarchical Fourier Interaction Networks (SHFIN), a frequency‑domain operator that couples locality, sparsity, and low‑rank mixing. SHFIN processes input via hierarchical patchwise FFTs, applies a differentiable top‑K spectral mask learned with Gumbel‑Softmax, and employs a gated low‑rank bilinear mixer to model cross‑frequency interactions, achieving sub‑quadratic complexity and parameter efficiency. Across ImageNet‑1k, CIFAR, and WMT14 En→De, SHFIN delivers competitive or superior accuracy while reducing parameters, FLOPs, and latency relative to CNN, Transformer, and Fourier baselines. This work suggests a hardware‑friendly, interpretable alternative to traditional operators, with clear avenues for adaptive sparsity, hardware accelerators, and extensions to higher‑dimensional data.
Abstract
This paper presents a Sparse Hierarchical Fourier Interaction Networks, an architectural building block that unifies three complementary principles of frequency domain modeling: A hierarchical patch wise Fourier transform that affords simultaneous access to local detail and global context; A learnable, differentiable top K masking mechanism which retains only the most informative spectral coefficients, thereby exploiting the natural compressibility of visual and linguistic signals.
