Table of Contents
Fetching ...

Efficient Logit-based Knowledge Distillation of Deep Spiking Neural Networks for Full-Range Timestep Deployment

Chengting Yu, Xiaochen Zhao, Lei Liu, Shu Yang, Gaoang Wang, Erping Li, Aili Wang

TL;DR

This work tackles the rigidity of SNN deployment caused by fixed inference timesteps and accuracy gaps to ANNs. It introduces a temporal-wise logits-based distillation framework that decouples targets across timesteps and augments learning with final ensemble self-distillation, backed by convergence proofs. Empirically, it achieves state-of-the-art performance among distillation-based SNN methods on CIFAR-10/100, ImageNet, and CIFAR10-DVS, while preserving training efficiency comparable to standard KD. The approach enables a single trained model to perform robustly across a full range of timesteps, facilitating flexible, energy-efficient deployment on neuromorphic hardware. Overall, the method advances SNN usability by providing theoretical guarantees and practical benefits for full-range timestep deployment.

Abstract

Spiking Neural Networks (SNNs) are emerging as a brain-inspired alternative to traditional Artificial Neural Networks (ANNs), prized for their potential energy efficiency on neuromorphic hardware. Despite this, SNNs often suffer from accuracy degradation compared to ANNs and face deployment challenges due to fixed inference timesteps, which require retraining for adjustments, limiting operational flexibility. To address these issues, our work considers the spatio-temporal property inherent in SNNs, and proposes a novel distillation framework for deep SNNs that optimizes performance across full-range timesteps without specific retraining, enhancing both efficacy and deployment adaptability. We provide both theoretical analysis and empirical validations to illustrate that training guarantees the convergence of all implicit models across full-range timesteps. Experimental results on CIFAR-10, CIFAR-100, CIFAR10-DVS, and ImageNet demonstrate state-of-the-art performance among distillation-based SNNs training methods. Our code is available at https://github.com/Intelli-Chip-Lab/snn\_temporal\_decoupling\_distillation.

Efficient Logit-based Knowledge Distillation of Deep Spiking Neural Networks for Full-Range Timestep Deployment

TL;DR

This work tackles the rigidity of SNN deployment caused by fixed inference timesteps and accuracy gaps to ANNs. It introduces a temporal-wise logits-based distillation framework that decouples targets across timesteps and augments learning with final ensemble self-distillation, backed by convergence proofs. Empirically, it achieves state-of-the-art performance among distillation-based SNN methods on CIFAR-10/100, ImageNet, and CIFAR10-DVS, while preserving training efficiency comparable to standard KD. The approach enables a single trained model to perform robustly across a full range of timesteps, facilitating flexible, energy-efficient deployment on neuromorphic hardware. Overall, the method advances SNN usability by providing theoretical guarantees and practical benefits for full-range timestep deployment.

Abstract

Spiking Neural Networks (SNNs) are emerging as a brain-inspired alternative to traditional Artificial Neural Networks (ANNs), prized for their potential energy efficiency on neuromorphic hardware. Despite this, SNNs often suffer from accuracy degradation compared to ANNs and face deployment challenges due to fixed inference timesteps, which require retraining for adjustments, limiting operational flexibility. To address these issues, our work considers the spatio-temporal property inherent in SNNs, and proposes a novel distillation framework for deep SNNs that optimizes performance across full-range timesteps without specific retraining, enhancing both efficacy and deployment adaptability. We provide both theoretical analysis and empirical validations to illustrate that training guarantees the convergence of all implicit models across full-range timesteps. Experimental results on CIFAR-10, CIFAR-100, CIFAR10-DVS, and ImageNet demonstrate state-of-the-art performance among distillation-based SNNs training methods. Our code is available at https://github.com/Intelli-Chip-Lab/snn\_temporal\_decoupling\_distillation.

Paper Structure

This paper contains 24 sections, 19 equations, 4 figures, 14 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of the primary challenges and motivations. (a) Standard logits-based knowledge distillation (logits-KD) training suffers from large accuracy degradation and requires different models to adapt to various inference timestep settings. (b) The proposed distillation framework reduces the gap and ensures a single model for full-range timesteps.
  • Figure 2: Framework overview. (a) Standard Logit-based Distillation defines targets on the final ensemble outputs, where model convergence is not guaranteed with reductions in inference timesteps. (b) Temporal-wise Logit-based Distillation decouples the targets into each temporal output, resulting in the guaranteed convergence of all implicit full-range timestep models.
  • Figure 3: Loss Trends. Results of timestep ensembles during training using ResNet-18 on the CIFAR100 dataset.
  • Figure 4: Visual Results of t-SNE Projections. The features are learned by (a) standard logits-based distillation and (b) the proposed temporal-wise distillation. Each subfigure progressively shows cumulative voting including more timesteps, with the final ensemble shown on the right.