Table of Contents
Fetching ...

A Spatial-channel-temporal-fused Attention for Spiking Neural Networks

Wuque Cai, Hongze Sun, Rui Liu, Yan Cui, Jun Wang, Yang Xia, Dezhong Yao, Daqing Guo

TL;DR

Findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.

Abstract

Spiking neural networks (SNNs) mimic brain computational strategies, and exhibit substantial capabilities in spatiotemporal information processing. As an essential factor for human perception, visual attention refers to the dynamic process for selecting salient regions in biological vision systems. Although visual attention mechanisms have achieved great success in computer vision applications, they are rarely introduced into SNNs. Inspired by experimental observations on predictive attentional remapping, we propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions by utilizing accumulated historical spatial-channel information in the present study. Through a systematic evaluation on three event stream datasets (DVS Gesture, SL-Animals-DVS and MNIST-DVS), we demonstrate that the SNN with the SCTFA module (SCTFA-SNN) not only significantly outperforms the baseline SNN (BL-SNN) and two other SNN models with degenerated attention modules, but also achieves competitive accuracy with existing state-of-the-art methods. Additionally, our detailed analysis shows that the proposed SCTFA-SNN model has strong robustness to noise and outstanding stability when faced with incomplete data, while maintaining acceptable complexity and efficiency. Overall, these findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.

A Spatial-channel-temporal-fused Attention for Spiking Neural Networks

TL;DR

Findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.

Abstract

Spiking neural networks (SNNs) mimic brain computational strategies, and exhibit substantial capabilities in spatiotemporal information processing. As an essential factor for human perception, visual attention refers to the dynamic process for selecting salient regions in biological vision systems. Although visual attention mechanisms have achieved great success in computer vision applications, they are rarely introduced into SNNs. Inspired by experimental observations on predictive attentional remapping, we propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions by utilizing accumulated historical spatial-channel information in the present study. Through a systematic evaluation on three event stream datasets (DVS Gesture, SL-Animals-DVS and MNIST-DVS), we demonstrate that the SNN with the SCTFA module (SCTFA-SNN) not only significantly outperforms the baseline SNN (BL-SNN) and two other SNN models with degenerated attention modules, but also achieves competitive accuracy with existing state-of-the-art methods. Additionally, our detailed analysis shows that the proposed SCTFA-SNN model has strong robustness to noise and outstanding stability when faced with incomplete data, while maintaining acceptable complexity and efficiency. Overall, these findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.
Paper Structure (18 sections, 16 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 16 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: The architecture and unfold form of the proposed SCTFA-SNN model. The SCTFA module is inserted into each convolutional (Conv) layer and converts the spiking feature maps (SFMs) into an attention tensor (blue). The SFMs extracted by the last convolutional layer are input into a dense layer for classification.
  • Figure 2: Diagram of the SCTFA module. The attention block in the SCTFA module is composed of both channel and spatial attention blocks. This attention block generates a 3-D attention tensor that excites corresponding neurons in the same layer, with its historical influence adjusted by the decay factor of the membrane potentials.
  • Figure 3: Average training curves of different SNN models on three datasets. (a) DVS Gesture, (b) SL-Animals-DVS and (c) MNIST-DVS. Different colors represent different SNN models: BL-SNN (blue), STFA-SNN (green), CTFA-SNN (orange) and SCTFA-SNN (red).
  • Figure 4: Visualization of the input frames of typical samples and their corresponding attentional heatmaps captured by the BL-SNN, STFA-SNN, CTFA-SNN and SCTFA-SNN models. (a) DVS Gesture and (b) SL-Animals-DVS. $R$ indicates the average firing ratio in the target class at the voting layer during the inference phase. CCW: Counter Clockwise.
  • Figure 5: Case study on the DVS Gesture. (a) An example of the input frames of left hand counterclockwise for different timesteps. (b) Visualization of the average spiking activity of neurons for the first 16 channels in the first convolution layer for different SNN models. Red represents high spiking activation, whereas blue indicates low spiking activation.
  • ...and 4 more figures