Table of Contents
Fetching ...

Self-Attentive Spatio-Temporal Calibration for Precise Intermediate Layer Matching in ANN-to-SNN Distillation

Di Hong, Yueming Wang

TL;DR

This paper tackles the accuracy gap between ANNs and SNNs by addressing spatio-temporal semantic mismatches during ANN-to-SNN distillation. It proposes Self-Attentive Spatio-Temporal Calibration (SASTC), which uses self-attention to allocate per-time-step associations between ANN and SNN intermediate features, guided by projected similarity matrices and learned queries/keys. A Spatio-Temporal Mismatch Score ($STM\ score$) quantifies misalignment and demonstrates that SASTC achieves semantic matching, leading to substantial gains across CIFAR-10/100, ImageNet, and neuromorphic datasets, while also showing robustness to noisy labels. The approach yields state-of-the-art results and provides a principled mechanism to reduce negative regularization, enabling more accurate and practical SNNs for energy-efficient AI.

Abstract

Spiking Neural Networks (SNNs) are promising for low-power computation due to their event-driven mechanism but often suffer from lower accuracy compared to Artificial Neural Networks (ANNs). ANN-to-SNN knowledge distillation can improve SNN performance, but previous methods either focus solely on label information, missing valuable intermediate layer features, or use a layer-wise approach that neglects spatial and temporal semantic inconsistencies, leading to performance degradation.To address these limitations, we propose a novel method called self-attentive spatio-temporal calibration (SASTC). SASTC uses self-attention to identify semantically aligned layer pairs between ANN and SNN, both spatially and temporally. This enables the autonomous transfer of relevant semantic information. Extensive experiments show that SASTC outperforms existing methods, effectively solving the mismatching problem. Superior accuracy results include 95.12% on CIFAR-10, 79.40% on CIFAR-100 with 2 time steps, and 68.69% on ImageNet with 4 time steps for static datasets, and 97.92% on DVS-Gesture and 83.60% on DVS-CIFAR10 for neuromorphic datasets. This marks the first time SNNs have outperformed ANNs on both CIFAR-10 and CIFAR-100, shedding the new light on the potential applications of SNNs.

Self-Attentive Spatio-Temporal Calibration for Precise Intermediate Layer Matching in ANN-to-SNN Distillation

TL;DR

This paper tackles the accuracy gap between ANNs and SNNs by addressing spatio-temporal semantic mismatches during ANN-to-SNN distillation. It proposes Self-Attentive Spatio-Temporal Calibration (SASTC), which uses self-attention to allocate per-time-step associations between ANN and SNN intermediate features, guided by projected similarity matrices and learned queries/keys. A Spatio-Temporal Mismatch Score () quantifies misalignment and demonstrates that SASTC achieves semantic matching, leading to substantial gains across CIFAR-10/100, ImageNet, and neuromorphic datasets, while also showing robustness to noisy labels. The approach yields state-of-the-art results and provides a principled mechanism to reduce negative regularization, enabling more accurate and practical SNNs for energy-efficient AI.

Abstract

Spiking Neural Networks (SNNs) are promising for low-power computation due to their event-driven mechanism but often suffer from lower accuracy compared to Artificial Neural Networks (ANNs). ANN-to-SNN knowledge distillation can improve SNN performance, but previous methods either focus solely on label information, missing valuable intermediate layer features, or use a layer-wise approach that neglects spatial and temporal semantic inconsistencies, leading to performance degradation.To address these limitations, we propose a novel method called self-attentive spatio-temporal calibration (SASTC). SASTC uses self-attention to identify semantically aligned layer pairs between ANN and SNN, both spatially and temporally. This enables the autonomous transfer of relevant semantic information. Extensive experiments show that SASTC outperforms existing methods, effectively solving the mismatching problem. Superior accuracy results include 95.12% on CIFAR-10, 79.40% on CIFAR-100 with 2 time steps, and 68.69% on ImageNet with 4 time steps for static datasets, and 97.92% on DVS-Gesture and 83.60% on DVS-CIFAR10 for neuromorphic datasets. This marks the first time SNNs have outperformed ANNs on both CIFAR-10 and CIFAR-100, shedding the new light on the potential applications of SNNs.
Paper Structure (26 sections, 11 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 11 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: An overview of the proposed Self-Attentive Spatio-Temporal Calibration.
  • Figure 2: Illustration of negative regularization on CIFAR-10 with three model combinations. Each tick label of x-axis denotes an SNN (student) layer number. Different color bars indicate the results of different specified ANN-SNN layer combinations.
  • Figure 3: Spike Activation Map (SAM) visualization of ANN-to-SNN distillation approaches on ImageNet. The red regions highlight areas deemed important for model inference.