Spiking Neural Networks Need High Frequency Information
Yuetong Fang, Deming Zhou, Ziqing Wang, Hongwei Ren, ZeCui Zeng, Lusong Li, Shibo Zhou, Renjing Xu
TL;DR
The paper investigates why Spiking Neural Networks underperform compared with Artificial Neural Networks, identifying a fundamental frequency bias where spiking neurons naturally suppress high-frequency information. It provides a theoretical proof that spiking neurons act as low-pass filters and introduces Max-Former, which restores high-frequency content using Max-Pool in patch embedding and Depth-Wise Convolution in early token mixing, with final-stage SSA. Empirically, Max-Former achieves 82.39% top-1 on ImageNet (63.99M params)—outperforming Spikformer by +7.58% while consuming ~30% less energy—and also delivers strong CIFAR-10/100 and neuromorphic results, plus state-of-the-art performance for Max-ResNet-18 on CIFAR benchmarks. This work suggests that preserving high-frequency information is crucial for SNNs and offers simple, scalable architectural adjustments to enhance spike-based computation across vision tasks.
Abstract
Spiking Neural Networks promise brain-inspired and energy-efficient computation by transmitting information through binary (0/1) spikes. Yet, their performance still lags behind that of artificial neural networks, often assumed to result from information loss caused by sparse and binary activations. In this work, we challenge this long-standing assumption and reveal a previously overlooked frequency bias: spiking neurons inherently suppress high-frequency components and preferentially propagate low-frequency information. This frequency-domain imbalance, we argue, is the root cause of degraded feature representation in SNNs. Empirically, on Spiking Transformers, adopting Avg-Pooling (low-pass) for token mixing lowers performance to 76.73% on Cifar-100, whereas replacing it with Max-Pool (high-pass) pushes the top-1 accuracy to 79.12%. Accordingly, we introduce Max-Former that restores high-frequency signals through two frequency-enhancing operators: (1) extra Max-Pool in patch embedding, and (2) Depth-Wise Convolution in place of self-attention. Notably, Max-Former attains 82.39% top-1 accuracy on ImageNet using only 63.99M parameters, surpassing Spikformer (74.81%, 66.34M) by +7.58%. Extending our insight beyond transformers, our Max-ResNet-18 achieves state-of-the-art performance on convolution-based benchmarks: 97.17% on CIFAR-10 and 83.06% on CIFAR-100. We hope this simple yet effective solution inspires future research to explore the distinctive nature of spiking neural networks. Code is available: https://github.com/bic-L/MaxFormer.
