Table of Contents
Fetching ...

Advancing Spiking Neural Networks towards Multiscale Spatiotemporal Interaction Learning

Yimeng Shan, Malu Zhang, Rui-jie Zhu, Xuerui Qiu, Jason K. Eshraghian, Haicheng Qu

TL;DR

The paper tackles the gap between Spiking Neural Networks (SNNs) and Artificial Neural Networks (ANNs) by addressing the underutilization of multiscale spatiotemporal information in event data. It introduces Spiking Multiscale Attention (SMA), which integrates multiscale coding with spatiotemporal attention to balance global and local features, and Attention Zoneout (AZO), a regularization technique that uses attention weights to form pseudo-ensembles and improve generalization. Empirically, SMA and AZO deliver state-of-the-art results on neuromorphic datasets (e.g., CIFAR10-DVS and N-Caltech101) and set a new baseline on ImageNet-1K with a non-transformer architecture (77.1% top-1 for a 104-layer SMA-ResNet). This work demonstrates that incorporating multiscale spatiotemporal interactions into SNNs can bridge the performance gap with ANNs while maintaining energy-efficient, spike-driven computation. $N$ scales are used for multiscale coding, and attention weights $ oldsymbol{W}_eta $ and $ oldsymbol{W}_eta,t $ guide feature fusion, with AZO further enhancing robustness by strategic noise insertion at weak points via replacements determined by $oldsymbol{ au_t}$ and $oldsymbol{ au_c}$.

Abstract

Recent advancements in neuroscience research have propelled the development of Spiking Neural Networks (SNNs), which not only have the potential to further advance neuroscience research but also serve as an energy-efficient alternative to Artificial Neural Networks (ANNs) due to their spike-driven characteristics. However, previous studies often neglected the multiscale information and its spatiotemporal correlation between event data, leading SNN models to approximate each frame of input events as static images. We hypothesize that this oversimplification significantly contributes to the performance gap between SNNs and traditional ANNs. To address this issue, we have designed a Spiking Multiscale Attention (SMA) module that captures multiscale spatiotemporal interaction information. Furthermore, we developed a regularization method named Attention ZoneOut (AZO), which utilizes spatiotemporal attention weights to reduce the model's generalization error through pseudo-ensemble training. Our approach has achieved state-of-the-art results on mainstream neural morphology datasets. Additionally, we have reached a performance of 77.1% on the Imagenet-1K dataset using a 104-layer ResNet architecture enhanced with SMA and AZO. This achievement confirms the state-of-the-art performance of SNNs with non-transformer architectures and underscores the effectiveness of our method in bridging the performance gap between SNN models and traditional ANN models.

Advancing Spiking Neural Networks towards Multiscale Spatiotemporal Interaction Learning

TL;DR

The paper tackles the gap between Spiking Neural Networks (SNNs) and Artificial Neural Networks (ANNs) by addressing the underutilization of multiscale spatiotemporal information in event data. It introduces Spiking Multiscale Attention (SMA), which integrates multiscale coding with spatiotemporal attention to balance global and local features, and Attention Zoneout (AZO), a regularization technique that uses attention weights to form pseudo-ensembles and improve generalization. Empirically, SMA and AZO deliver state-of-the-art results on neuromorphic datasets (e.g., CIFAR10-DVS and N-Caltech101) and set a new baseline on ImageNet-1K with a non-transformer architecture (77.1% top-1 for a 104-layer SMA-ResNet). This work demonstrates that incorporating multiscale spatiotemporal interactions into SNNs can bridge the performance gap with ANNs while maintaining energy-efficient, spike-driven computation. scales are used for multiscale coding, and attention weights and guide feature fusion, with AZO further enhancing robustness by strategic noise insertion at weak points via replacements determined by and .

Abstract

Recent advancements in neuroscience research have propelled the development of Spiking Neural Networks (SNNs), which not only have the potential to further advance neuroscience research but also serve as an energy-efficient alternative to Artificial Neural Networks (ANNs) due to their spike-driven characteristics. However, previous studies often neglected the multiscale information and its spatiotemporal correlation between event data, leading SNN models to approximate each frame of input events as static images. We hypothesize that this oversimplification significantly contributes to the performance gap between SNNs and traditional ANNs. To address this issue, we have designed a Spiking Multiscale Attention (SMA) module that captures multiscale spatiotemporal interaction information. Furthermore, we developed a regularization method named Attention ZoneOut (AZO), which utilizes spatiotemporal attention weights to reduce the model's generalization error through pseudo-ensemble training. Our approach has achieved state-of-the-art results on mainstream neural morphology datasets. Additionally, we have reached a performance of 77.1% on the Imagenet-1K dataset using a 104-layer ResNet architecture enhanced with SMA and AZO. This achievement confirms the state-of-the-art performance of SNNs with non-transformer architectures and underscores the effectiveness of our method in bridging the performance gap between SNN models and traditional ANN models.
Paper Structure (29 sections, 20 equations, 10 figures, 14 tables, 1 algorithm)

This paper contains 29 sections, 20 equations, 10 figures, 14 tables, 1 algorithm.

Figures (10)

  • Figure 1: Displaying Learning Patterns of Several Mainstream Models. All four images depict attention heat maps based on the Spiking Firing Rate (SFR), with red indicating high activation and blue representing low spiking activity.
  • Figure 2: The overview of Spiking Multiscale Attention (SMA) module. In the figure, the schematic diagram of the encoder is shown on the right side, the schematic diagram of the decoder is displayed in the lower-left corner, and the schematic diagram of the Multiscale SE (MSE) block module is positioned in the center.
  • Figure 3: Ablation study of different SMA positions based on DVS128 Gesture. Inspired by previous work tama, we placed SMA behind convolution layer in the first two groups of experiments.
  • Figure 4: Visualization of typical sample input frames and their attentional heatmaps: (a) shows the input frame; (b) and (c) display attention heat maps based on the Spiking Firing Rate (SFR), where red indicates high and blue shows low spiking activation. Heat map (b) is from the Spiking-VGG8 model and (c) from the SMA-SNN model. All heat maps are from the first convolutional layer of each model, except the coding layer. A spike count comparison is shown on the right.
  • Figure 5: Scale importance between different types of samples.
  • ...and 5 more figures