Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks
Kairong Yu, Chengting Yu, Tianqing Zhang, Xiaochen Zhao, Shu Yang, Hongwei Wang, Qiang Zhang, Qi Xu
TL;DR
This work targets the performance gap between Spiking Neural Networks (SNNs) and Artificial Neural Networks (ANNs) by tailoring knowledge distillation to the temporal nature of SNNs. The authors introduce Temporal Separation Knowledge Distillation with Entropy Regularization (TSER), which distills teacher logits at each time step and applies entropy-based stabilization to avoid propagating erroneous teacher knowledge. Key contributions include the temporal separation loss, the entropy regularization term, and extensive evaluation showing state-of-the-art results on CIFAR-10/100 and competitive performance on ImageNet, while maintaining energy-efficient operation. The approach advances practical SNN deployment by better leveraging spatiotemporal dynamics without adding prohibitive computation or time steps.
Abstract
Spiking Neural Networks (SNNs), inspired by the human brain, offer significant computational efficiency through discrete spike-based information transfer. Despite their potential to reduce inference energy consumption, a performance gap persists between SNNs and Artificial Neural Networks (ANNs), primarily due to current training methods and inherent model limitations. While recent research has aimed to enhance SNN learning by employing knowledge distillation (KD) from ANN teacher networks, traditional distillation techniques often overlook the distinctive spatiotemporal properties of SNNs, thus failing to fully leverage their advantages. To overcome these challenge, we propose a novel logit distillation method characterized by temporal separation and entropy regularization. This approach improves existing SNN distillation techniques by performing distillation learning on logits across different time steps, rather than merely on aggregated output features. Furthermore, the integration of entropy regularization stabilizes model optimization and further boosts the performance. Extensive experimental results indicate that our method surpasses prior SNN distillation strategies, whether based on logit distillation, feature distillation, or a combination of both. The code will be available on GitHub.
