Table of Contents
Fetching ...

BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He

TL;DR

BKDSNN addresses the accuracy gap in learning-based SNNs by introducing blurred knowledge distillation (BKD) and a restoration block to better imitate ANN features. BKD is applied to the intermediate feature before the last layer and can be combined with logits-based distillation for a mixed-distillation regime, yielding state-of-the-art results on CIFAR10/100 and ImageNet for both CNN- and Transformer-based SNNs, as well as neuromorphic data like CIFAR10-DVS. The approach improves feature alignment and gradient estimation, enabling strong performance with ultra-low time-steps and offering favorable energy-accuracy trade-offs. The results suggest BKDSNN as a practical and scalable path to narrowing the gap between SNNs and ANNs in real-world Vision tasks.

Abstract

Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to the difficulty in deriving precise gradient estimation for discrete spikes using learning-based method, a distinct accuracy gap persists between SNN and its artificial neural networks (ANNs) counterpart. To address the aforementioned issue, we propose a blurred knowledge distillation (BKD) technique, which leverages random blurred SNN feature to restore and imitate the ANN feature. Note that, our BKD is applied upon the feature map right before the last layer of SNN, which can also mix with prior logits-based knowledge distillation for maximized accuracy boost. To our best knowledge, in the category of learning-based methods, our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets. On ImageNet dataset, BKDSNN outperforms prior best results by 4.51% and 0.93% with the network topology of CNN and Transformer respectively.

BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

TL;DR

BKDSNN addresses the accuracy gap in learning-based SNNs by introducing blurred knowledge distillation (BKD) and a restoration block to better imitate ANN features. BKD is applied to the intermediate feature before the last layer and can be combined with logits-based distillation for a mixed-distillation regime, yielding state-of-the-art results on CIFAR10/100 and ImageNet for both CNN- and Transformer-based SNNs, as well as neuromorphic data like CIFAR10-DVS. The approach improves feature alignment and gradient estimation, enabling strong performance with ultra-low time-steps and offering favorable energy-accuracy trade-offs. The results suggest BKDSNN as a practical and scalable path to narrowing the gap between SNNs and ANNs in real-world Vision tasks.

Abstract

Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to the difficulty in deriving precise gradient estimation for discrete spikes using learning-based method, a distinct accuracy gap persists between SNN and its artificial neural networks (ANNs) counterpart. To address the aforementioned issue, we propose a blurred knowledge distillation (BKD) technique, which leverages random blurred SNN feature to restore and imitate the ANN feature. Note that, our BKD is applied upon the feature map right before the last layer of SNN, which can also mix with prior logits-based knowledge distillation for maximized accuracy boost. To our best knowledge, in the category of learning-based methods, our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets. On ImageNet dataset, BKDSNN outperforms prior best results by 4.51% and 0.93% with the network topology of CNN and Transformer respectively.
Paper Structure (38 sections, 16 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 38 sections, 16 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of training SNN with blurred knowledge distillation (BKD). The SNN is directly trained with back-propagation through time (BPTT) wu2019direct, where we utilize a blurred variant of KD (i.e., BKD) to achieve higher accuracy. BKD highlighted in yellow shaded region differs from prior SNN-oriented KD in three perspectives: 1) A blurred matrix ${\bm{B}}$ is randomly sampled on the fly (per input image) to mask out the feature of student SNN; 2) A restoration block $\mathcal{G}$ consisting of two convolutional layers connected by a ReLU layer is applied on blurred SNN features to restore and mimic ANN features; 3) Such blurred knowledge distillation is applied only to the intermediate features before the last layer.
  • Figure 2: (a) Comparison of feature visualization in ResNet-18 under various methods; (b) Histogram of SNN and ANN feature.
  • Figure 3: (a) Feature map visualization of different methods on SEW ResNet-18 (CNN) and Sformer-8-384+CML (Transformer). Input images are sampled from the ImageNet validation dataset. Shaded colors from blue to red indicate the impacts of the regions on the classification scores from low to high. (b) Top1 accuracy versus GPU hours on ImageNet with SEW ResNet-50 and Spikingformer-8-768.
  • Figure 4: Evolution curve of validation accuracy during training, with CNN- (top 3 sub-figures) and Transformer-based (bottom 3 sub-figures) SNNs on ImageNet. The labels in the legends are the same as in \ref{['tab:ablation-distillation']}.
  • Figure A1: Illustration of gradient correction with BKD for IF neuron.
  • ...and 2 more figures