Table of Contents
Fetching ...

Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement

Shu Yang, Chengting Yu, Lei Liu, Hanzhi Ma, Aili Wang, Erping Li

TL;DR

This work tackles the challenge of leveraging pretrained ANN knowledge to guide SNN learning without incurring the high cost of full BPTT. It introduces a rate-based, block-wise ANN-guided distillation framework that builds intermediate hybrid models by replacing ANN blocks with rate-aligned SNN modules via learnable mappings, while employing a combined loss of cross-entropy and KL-based distillation. The framework enables implicit alignment of rate-based SNN representations with ANN features, mitigates gradient distortion from the STE, and uses rate-based backpropagation to decouple time. Empirically, it achieves state-of-the-art or competitive results on CIFAR-10/100, CIFAR10-DVS, and ImageNet, with reduced training overhead compared to traditional BPTT methods.

Abstract

Spiking Neural Networks (SNNs) have garnered considerable attention as a potential alternative to Artificial Neural Networks (ANNs). Recent studies have highlighted SNNs' potential on large-scale datasets. For SNN training, two main approaches exist: direct training and ANN-to-SNN (ANN2SNN) conversion. To fully leverage existing ANN models in guiding SNN learning, either direct ANN-to-SNN conversion or ANN-SNN distillation training can be employed. In this paper, we propose an ANN-SNN distillation framework from the ANN-to-SNN perspective, designed with a block-wise replacement strategy for ANN-guided learning. By generating intermediate hybrid models that progressively align SNN feature spaces to those of ANN through rate-based features, our framework naturally incorporates rate-based backpropagation as a training method. Our approach achieves results comparable to or better than state-of-the-art SNN distillation methods, showing both training and learning efficiency.

Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement

TL;DR

This work tackles the challenge of leveraging pretrained ANN knowledge to guide SNN learning without incurring the high cost of full BPTT. It introduces a rate-based, block-wise ANN-guided distillation framework that builds intermediate hybrid models by replacing ANN blocks with rate-aligned SNN modules via learnable mappings, while employing a combined loss of cross-entropy and KL-based distillation. The framework enables implicit alignment of rate-based SNN representations with ANN features, mitigates gradient distortion from the STE, and uses rate-based backpropagation to decouple time. Empirically, it achieves state-of-the-art or competitive results on CIFAR-10/100, CIFAR10-DVS, and ImageNet, with reduced training overhead compared to traditional BPTT methods.

Abstract

Spiking Neural Networks (SNNs) have garnered considerable attention as a potential alternative to Artificial Neural Networks (ANNs). Recent studies have highlighted SNNs' potential on large-scale datasets. For SNN training, two main approaches exist: direct training and ANN-to-SNN (ANN2SNN) conversion. To fully leverage existing ANN models in guiding SNN learning, either direct ANN-to-SNN conversion or ANN-SNN distillation training can be employed. In this paper, we propose an ANN-SNN distillation framework from the ANN-to-SNN perspective, designed with a block-wise replacement strategy for ANN-guided learning. By generating intermediate hybrid models that progressively align SNN feature spaces to those of ANN through rate-based features, our framework naturally incorporates rate-based backpropagation as a training method. Our approach achieves results comparable to or better than state-of-the-art SNN distillation methods, showing both training and learning efficiency.

Paper Structure

This paper contains 14 sections, 17 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Framework Overview. In the rate propagation of the SNN, the rate-based representation is transformed via learnable modules to align with the ANN’s intermediate feature space and then propagated forward along the subsequent ANN layers, yielding a series of hybrid model outputs, $\{y_{M_0}, y_{M_1}, y_{M_2}\}$. At the logits end, losses are computed among the outputs of the ANN ($y_{A}$), the SNN ($y_{S}$), and the hybrid models. The ANN-guided distillation loss interacts with the ANN output, while the standard cross-entropy loss interacts with the hard labels $y$.
  • Figure 2: Measures of feature similarity. The results of cosine similarity distances are obtained by ResNet-18 on CIFAR-100. Each subplot is labeled according to the naming convention “R18(ResNet-18)-T(timesteps)-M($M_k$).”
  • Figure 3: Validation accuracy of ANN-SNN hybrid models and SNN model during training. The results are obtained by ResNet-18 on CIFAR-10. Each subplot is labeled according to the naming convention “R18(ResNet-18)-T(timesteps)-M($M_k$)/S(SNN).”
  • Figure 4: Comparison of Training overhead between our method and BPTT for direct training. The results are obtained by averaging over three epochs of stable operation with ResNet-18 on CIFAR-100, using a single NVIDIA 3090 GPU.