Table of Contents
Fetching ...

Enhanced Self-Distillation Framework for Efficient Spiking Neural Network Training

Xiaochen Zhao, Chengting Yu, Kairong Yu, Lei Liu, Aili Wang

TL;DR

The paper tackles the challenge of training high-performance Spiking Neural Networks under limited compute by introducing a rate-based training framework augmented with lightweight auxiliary ANN branches. It further proposes reliability-separated self-distillation to selectively utilize only trustworthy teacher signals from multiple branches, mitigating negative transfer from unreliable predictions. Empirical results on CIFAR-10/100, ImageNet, and CIFAR10-DVS demonstrate substantial reductions in training memory and time while achieving competitive accuracy, bridging the gap between efficient rate-based methods and BPTT-based direct training. The approach offers practical benefits for energy-efficient SNN deployment and provides open-source code for reproducibility.

Abstract

Spiking Neural Networks (SNNs) exhibit exceptional energy efficiency on neuromorphic hardware due to their sparse activation patterns. However, conventional training methods based on surrogate gradients and Backpropagation Through Time (BPTT) not only lag behind Artificial Neural Networks (ANNs) in performance, but also incur significant computational and memory overheads that grow linearly with the temporal dimension. To enable high-performance SNN training under limited computational resources, we propose an enhanced self-distillation framework, jointly optimized with rate-based backpropagation. Specifically, the firing rates of intermediate SNN layers are projected onto lightweight ANN branches, and high-quality knowledge generated by the model itself is used to optimize substructures through the ANN pathways. Unlike traditional self-distillation paradigms, we observe that low-quality self-generated knowledge may hinder convergence. To address this, we decouple the teacher signal into reliable and unreliable components, ensuring that only reliable knowledge is used to guide the optimization of the model. Extensive experiments on CIFAR-10, CIFAR-100, CIFAR10-DVS, and ImageNet demonstrate that our method reduces training complexity while achieving high-performance SNN training. Our code is available at https://github.com/Intelli-Chip-Lab/enhanced-self-distillation-framework-for-snn.

Enhanced Self-Distillation Framework for Efficient Spiking Neural Network Training

TL;DR

The paper tackles the challenge of training high-performance Spiking Neural Networks under limited compute by introducing a rate-based training framework augmented with lightweight auxiliary ANN branches. It further proposes reliability-separated self-distillation to selectively utilize only trustworthy teacher signals from multiple branches, mitigating negative transfer from unreliable predictions. Empirical results on CIFAR-10/100, ImageNet, and CIFAR10-DVS demonstrate substantial reductions in training memory and time while achieving competitive accuracy, bridging the gap between efficient rate-based methods and BPTT-based direct training. The approach offers practical benefits for energy-efficient SNN deployment and provides open-source code for reproducibility.

Abstract

Spiking Neural Networks (SNNs) exhibit exceptional energy efficiency on neuromorphic hardware due to their sparse activation patterns. However, conventional training methods based on surrogate gradients and Backpropagation Through Time (BPTT) not only lag behind Artificial Neural Networks (ANNs) in performance, but also incur significant computational and memory overheads that grow linearly with the temporal dimension. To enable high-performance SNN training under limited computational resources, we propose an enhanced self-distillation framework, jointly optimized with rate-based backpropagation. Specifically, the firing rates of intermediate SNN layers are projected onto lightweight ANN branches, and high-quality knowledge generated by the model itself is used to optimize substructures through the ANN pathways. Unlike traditional self-distillation paradigms, we observe that low-quality self-generated knowledge may hinder convergence. To address this, we decouple the teacher signal into reliable and unreliable components, ensuring that only reliable knowledge is used to guide the optimization of the model. Extensive experiments on CIFAR-10, CIFAR-100, CIFAR10-DVS, and ImageNet demonstrate that our method reduces training complexity while achieving high-performance SNN training. Our code is available at https://github.com/Intelli-Chip-Lab/enhanced-self-distillation-framework-for-snn.

Paper Structure

This paper contains 31 sections, 15 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: The performance of different training methods on ResNet-18 with the CIFAR-100 dataset shows that our approach achieves superior training efficiency and accuracy compared to current mainstream methods.
  • Figure 2: Framework Overview. Unlike the standard BPTT approach, we propose a rate-based framework that first performs a forward pass of temporal spiking activity to update the eligibility traces. In a subsequent forward pass, intermediate layer features are projected onto auxiliary ANN modules. A decoupling module then integrates teacher signals, which are jointly optimized with the ground-truth labels to supervise the corresponding substructures.
  • Figure 3: Ablation bar chart of the self-distillation module on ANNs and SNNs.
  • Figure 4: Comparison of training cost and test performance of time steps.
  • Figure 5: Classification performance of classifiers with different depths during the training process.
  • ...and 4 more figures