Table of Contents
Fetching ...

SpikeAtConv: An Integrated Spiking-Convolutional Attention Architecture for Energy-Efficient Neuromorphic Vision Processing

Wangdan Liao, Weidong Wang

Abstract

Spiking Neural Networks (SNNs) offer a biologically inspired alternative to conventional artificial neural networks, with potential advantages in power efficiency due to their event-driven computation. Despite their promise, SNNs have yet to achieve competitive performance on complex visual tasks, such as image classification. This study introduces a novel SNN architecture designed to enhance computational efficacy and task accuracy. The architecture features optimized pulse modules that facilitate the processing of spatio-temporal patterns in visual data, aiming to reconcile the computational demands of high-level vision tasks with the energy-efficient processing of SNNs. Our evaluations on standard image classification benchmarks indicate that the proposed architecture narrows the performance gap with traditional neural networks, providing insights into the design of more efficient and capable neuromorphic computing systems.

SpikeAtConv: An Integrated Spiking-Convolutional Attention Architecture for Energy-Efficient Neuromorphic Vision Processing

Abstract

Spiking Neural Networks (SNNs) offer a biologically inspired alternative to conventional artificial neural networks, with potential advantages in power efficiency due to their event-driven computation. Despite their promise, SNNs have yet to achieve competitive performance on complex visual tasks, such as image classification. This study introduces a novel SNN architecture designed to enhance computational efficacy and task accuracy. The architecture features optimized pulse modules that facilitate the processing of spatio-temporal patterns in visual data, aiming to reconcile the computational demands of high-level vision tasks with the energy-efficient processing of SNNs. Our evaluations on standard image classification benchmarks indicate that the proposed architecture narrows the performance gap with traditional neural networks, providing insights into the design of more efficient and capable neuromorphic computing systems.

Paper Structure

This paper contains 11 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The overview of SpikeAtConv. The model is primarily composed of three components: the Feature Extraction Layer, the Feature Encoding Layer, and the Decision Layer. Initially, the input image is subjected to preliminary processing within the Feature Extraction Layer, where essential characteristics are identified. Subsequently, the Feature Encoding Layer performs a comprehensive analysis to distill salient features from the extracted data. Finally, the decision layer synthesizes this information to generate the prediction results.
  • Figure 2: SPKBlock. Based on LIF neurons, we designed multiple SPK blocks to explore the impact of various hyperparameters and different combinations of multiple neurons on network performance. For example, the MBPL Block consists of multiple parallel neurons with different thresholds, while the DCL Block is composed of two parallel branches, each including a convolutional layer and a LIF neuron.
  • Figure 3: Attention SpikeMerge Block. This represents two different computational approaches. In the SISA Block, after computing the Q, K, and V, we add the SPK Blcok separately to obtain the spike form of Q, K, and V. Subsequently, we use Q and K to calculate the attention scores, apply these scores to V, and then incorporate the SPK Block to convert the attention into spike sequences. In the BDSA Block, we bypass the computation of Q, K, and V, directly converting the input into spike sequences through the SPK Block, treating Q, K, and V as the same.
  • Figure 4: Comparison of Loss Between MaxViT and SpikeAtConv. We present the training and validation loss trajectories of our SpikeAtConv and MaxViT models. In the figure below, we emphasize the loss variations during the first 10 epochs. It is evident that SpikeAtConv experiences a slower reduction in loss during the initial 5 epochs. Due to the application of data augmentation techniques such as auto augment and mixup during training, the training loss consistently remains higher than the test loss.