Advancing Spiking Neural Networks towards Multiscale Spatiotemporal Interaction Learning
Yimeng Shan, Malu Zhang, Rui-jie Zhu, Xuerui Qiu, Jason K. Eshraghian, Haicheng Qu
TL;DR
The paper tackles the gap between Spiking Neural Networks (SNNs) and Artificial Neural Networks (ANNs) by addressing the underutilization of multiscale spatiotemporal information in event data. It introduces Spiking Multiscale Attention (SMA), which integrates multiscale coding with spatiotemporal attention to balance global and local features, and Attention Zoneout (AZO), a regularization technique that uses attention weights to form pseudo-ensembles and improve generalization. Empirically, SMA and AZO deliver state-of-the-art results on neuromorphic datasets (e.g., CIFAR10-DVS and N-Caltech101) and set a new baseline on ImageNet-1K with a non-transformer architecture (77.1% top-1 for a 104-layer SMA-ResNet). This work demonstrates that incorporating multiscale spatiotemporal interactions into SNNs can bridge the performance gap with ANNs while maintaining energy-efficient, spike-driven computation. $N$ scales are used for multiscale coding, and attention weights $ oldsymbol{W}_eta $ and $ oldsymbol{W}_eta,t $ guide feature fusion, with AZO further enhancing robustness by strategic noise insertion at weak points via replacements determined by $oldsymbol{ au_t}$ and $oldsymbol{ au_c}$.
Abstract
Recent advancements in neuroscience research have propelled the development of Spiking Neural Networks (SNNs), which not only have the potential to further advance neuroscience research but also serve as an energy-efficient alternative to Artificial Neural Networks (ANNs) due to their spike-driven characteristics. However, previous studies often neglected the multiscale information and its spatiotemporal correlation between event data, leading SNN models to approximate each frame of input events as static images. We hypothesize that this oversimplification significantly contributes to the performance gap between SNNs and traditional ANNs. To address this issue, we have designed a Spiking Multiscale Attention (SMA) module that captures multiscale spatiotemporal interaction information. Furthermore, we developed a regularization method named Attention ZoneOut (AZO), which utilizes spatiotemporal attention weights to reduce the model's generalization error through pseudo-ensemble training. Our approach has achieved state-of-the-art results on mainstream neural morphology datasets. Additionally, we have reached a performance of 77.1% on the Imagenet-1K dataset using a 104-layer ResNet architecture enhanced with SMA and AZO. This achievement confirms the state-of-the-art performance of SNNs with non-transformer architectures and underscores the effectiveness of our method in bridging the performance gap between SNN models and traditional ANN models.
