A Closer Look at Knowledge Distillation in Spiking Neural Network Training
Xu Liu, Na Xia, Jinxing Zhou, Jingyuan Xu, Dan Guo
TL;DR
This work tackles the challenge of training energy-efficient Spiking Neural Networks (SNNs) via knowledge distillation from pretrained ANNs. It introduces two KD strategies—Saliency-scaled Activation Map Distillation (SAMD) and Noise-smoothed Logits Distillation (NLD)—to bridge semantic and distribution gaps between continuous ANN features/logits and discrete, sparse SNN representations. SAMD aligns the SNN spike activation map (SAM) with the teacher’s class activation map (CAM) using softmax-normalized saliency distributions, while NLD smooths SNN logits with Gaussian noise to resemble the teacher’s continuous logits. Across CIFAR-10/100, ImageNet-1K, and CIFAR10-DVS, CKDSNN achieves state-of-the-art accuracy with favorable energy-efficiency trade-offs, demonstrating robust cross-domain knowledge transfer for SNNs.
Abstract
Spiking Neural Networks (SNNs) become popular due to excellent energy efficiency, yet facing challenges for effective model training. Recent works improve this by introducing knowledge distillation (KD) techniques, with the pre-trained artificial neural networks (ANNs) used as teachers and the target SNNs as students. This is commonly accomplished through a straightforward element-wise alignment of intermediate features and prediction logits from ANNs and SNNs, often neglecting the intrinsic differences between their architectures. Specifically, ANN's outputs exhibit a continuous distribution, whereas SNN's outputs are characterized by sparsity and discreteness. To mitigate this issue, we introduce two innovative KD strategies. Firstly, we propose the Saliency-scaled Activation Map Distillation (SAMD), which aligns the spike activation map of the student SNN with the class-aware activation map of the teacher ANN. Rather than performing KD directly on the raw %and distinct features of ANN and SNN, our SAMD directs the student to learn from saliency activation maps that exhibit greater semantic and distribution consistency. Additionally, we propose a Noise-smoothed Logits Distillation (NLD), which utilizes Gaussian noise to smooth the sparse logits of student SNN, facilitating the alignment with continuous logits from teacher ANN. Extensive experiments on multiple datasets demonstrate the effectiveness of our methods. Code is available~\footnote{https://github.com/SinoLeu/CKDSNN.git}.
