Table of Contents
Fetching ...

Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference

Boyan Li, Luziwei Leng, Shuaijie Shen, Kaixuan Zhang, Jianguo Zhang, Jianxing Liao, Ran Cheng

TL;DR

This work tackles the difficulty of integrating attention-like mechanisms with Multiplication-Free Inference in spike-based networks by designing an MFI-compatible spiking MLP-Mixer. It introduces BN-based normalization, a spiking patch encoding module, and a two-branch token block alongside a channel block, forming a multi-stage pyramid network that balances global receptive fields with local feature extraction. Empirical results on ImageNet-1K and CIFAR benchmarks demonstrate competitive top-1 accuracy (e.g., $66.39\%$ on ImageNet-1K without pretraining, up to $71.64\%$ on a larger variant) with significantly reduced simulation steps and energy cost, relative to state-of-the-art deep spiking CNNs. The findings highlight the viability of deep SNNs built from spiking MLP primitives and point to future directions such as adaptive thresholds and spike-based attention modules to further close the gap with ANN transformers.

Abstract

Advancements in adapting deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. However, the inability of Multiplication-Free Inference (MFI) to align with attention and transformer mechanisms, which are critical to superior performance on high-resolution vision tasks, imposing limitations on these gains. To address this, our research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs). We propose an innovative spiking MLP architecture that uses batch normalization to retain MFI compatibility and introducing a spiking patch encoding layer to enhance local feature extraction capabilities. As a result, we establish an efficient multi-stage spiking MLP network that blends effectively global receptive fields with local feature extraction for comprehensive spike-based computation. Without relying on pre-training or sophisticated SNN training techniques, our network secures a top-1 accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational costs, model parameters, and simulation steps. An expanded version of our network compares with the performance of the spiking VGG-16 network with a 71.64% top-1 accuracy, all while operating with a model capacity 2.1 times smaller. Our findings highlight the potential of our deep SNN architecture in effectively integrating global and local learning abilities. Interestingly, the trained receptive field in our network mirrors the activity patterns of cortical cells. Source codes are publicly accessible at https://github.com/EMI-Group/mixer-snn.

Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference

TL;DR

This work tackles the difficulty of integrating attention-like mechanisms with Multiplication-Free Inference in spike-based networks by designing an MFI-compatible spiking MLP-Mixer. It introduces BN-based normalization, a spiking patch encoding module, and a two-branch token block alongside a channel block, forming a multi-stage pyramid network that balances global receptive fields with local feature extraction. Empirical results on ImageNet-1K and CIFAR benchmarks demonstrate competitive top-1 accuracy (e.g., on ImageNet-1K without pretraining, up to on a larger variant) with significantly reduced simulation steps and energy cost, relative to state-of-the-art deep spiking CNNs. The findings highlight the viability of deep SNNs built from spiking MLP primitives and point to future directions such as adaptive thresholds and spike-based attention modules to further close the gap with ANN transformers.

Abstract

Advancements in adapting deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. However, the inability of Multiplication-Free Inference (MFI) to align with attention and transformer mechanisms, which are critical to superior performance on high-resolution vision tasks, imposing limitations on these gains. To address this, our research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs). We propose an innovative spiking MLP architecture that uses batch normalization to retain MFI compatibility and introducing a spiking patch encoding layer to enhance local feature extraction capabilities. As a result, we establish an efficient multi-stage spiking MLP network that blends effectively global receptive fields with local feature extraction for comprehensive spike-based computation. Without relying on pre-training or sophisticated SNN training techniques, our network secures a top-1 accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational costs, model parameters, and simulation steps. An expanded version of our network compares with the performance of the spiking VGG-16 network with a 71.64% top-1 accuracy, all while operating with a model capacity 2.1 times smaller. Our findings highlight the potential of our deep SNN architecture in effectively integrating global and local learning abilities. Interestingly, the trained receptive field in our network mirrors the activity patterns of cortical cells. Source codes are publicly accessible at https://github.com/EMI-Group/mixer-snn.
Paper Structure (21 sections, 5 equations, 6 figures, 6 tables)

This paper contains 21 sections, 5 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The overall network architecture. The multi-stage network is downsampled with an SPE module at each stage. Within each stage, the SPE is followed by a sequence of spiking MLP-Mixers with identical architecture, each containing a spiking token block with axial sampling and a spiking channel block with full sampling.
  • Figure 2: Key blocks in the proposed network architecture.
  • Figure 3: Spiking patch encoding with a directed acyclic graph structure. The structure adheres to the MFI principle, with additions performed on BN states and multiplications performed between convolution weights and binary spikes.
  • Figure 4: Potential skip connections for the spiking MLP-Mixer.
  • Figure 5: Mean network spiking rate of MLP-SPE-T on the ImagNet-1K test set. We distinguish each stage with different colors.
  • ...and 1 more figures