Table of Contents
Fetching ...

Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks

Xian Zhong, Shengwang Hu, Wenxuan Liu, Wenxin Huang, Jianhao Ding, Zhaofei Yu, Tiejun Huang

TL;DR

This work disentangles the dependency between the number of event frames and the time steps of SNNs, utilizing more event frames during the training stage to improve performance, while using fewer event frames during the inference stage to reduce latency.

Abstract

Spiking neural networks (SNNs) have garnered significant attention for their low power consumption and high biological interpretability. Their rich spatio-temporal information processing capability and event-driven nature make them ideally well-suited for neuromorphic datasets. However, current SNNs struggle to balance accuracy and latency in classifying these datasets. In this paper, we propose Hybrid Step-wise Distillation (HSD) method, tailored for neuromorphic datasets, to mitigate the notable decline in performance at lower time steps. Our work disentangles the dependency between the number of event frames and the time steps of SNNs, utilizing more event frames during the training stage to improve performance, while using fewer event frames during the inference stage to reduce latency. Nevertheless, the average output of SNNs across all time steps is susceptible to individual time step with abnormal outputs, particularly at extremely low time steps. To tackle this issue, we implement Step-wise Knowledge Distillation (SKD) module that considers variations in the output distribution of SNNs at each time step. Empirical evidence demonstrates that our method yields competitive performance in classification tasks on neuromorphic datasets, especially at lower time steps. Our code will be available at: {https://github.com/hsw0929/HSD}.

Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks

TL;DR

This work disentangles the dependency between the number of event frames and the time steps of SNNs, utilizing more event frames during the training stage to improve performance, while using fewer event frames during the inference stage to reduce latency.

Abstract

Spiking neural networks (SNNs) have garnered significant attention for their low power consumption and high biological interpretability. Their rich spatio-temporal information processing capability and event-driven nature make them ideally well-suited for neuromorphic datasets. However, current SNNs struggle to balance accuracy and latency in classifying these datasets. In this paper, we propose Hybrid Step-wise Distillation (HSD) method, tailored for neuromorphic datasets, to mitigate the notable decline in performance at lower time steps. Our work disentangles the dependency between the number of event frames and the time steps of SNNs, utilizing more event frames during the training stage to improve performance, while using fewer event frames during the inference stage to reduce latency. Nevertheless, the average output of SNNs across all time steps is susceptible to individual time step with abnormal outputs, particularly at extremely low time steps. To tackle this issue, we implement Step-wise Knowledge Distillation (SKD) module that considers variations in the output distribution of SNNs at each time step. Empirical evidence demonstrates that our method yields competitive performance in classification tasks on neuromorphic datasets, especially at lower time steps. Our code will be available at: {https://github.com/hsw0929/HSD}.
Paper Structure (22 sections, 13 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 13 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Contrast of Vanilla, STS, DTS, and HSD methods. Both the training and inference stages in the Vanilla method utilize a uniform time step, often set to 10. STS maintains this time step solely during inference to minimize duration. DTS and HSD both adopt variable time steps, with HSD further segmenting the SNN training stage.
  • Figure 2: Comparisons performances of TET with STS, TET with DTS and HSD at time step $T$ = 1 to 5 on CIFAR10-DVS, N-Caltech101, and DVS-Gesture.
  • Figure 3: (a) Overall framework of proposed HSD. It includes pre-training phase and fine-tuning phase. Initially, the raw event stream undergoes integrating to form event frames. Subsequently, the event frames from the neuromorphic dataset are partitioned into two segments. In the pre-training phase, an ANN processes $T_1$ event frames to transmit rich spatial information to SNN. In the fine-tuning phase, ANN provides learned "Soft Labels" guidance to influence SNN's output at each time step. (b)--(c) illustrate the details of the two phases of training.
  • Figure 4: Feature visualization for the initial spiking encoder. (a)--(d) depicts HSD and TET deng2022 in $T$ = 1 and 5 on CIFAR10-DVS. (e) provides the corresponding feature visualizations for channel 37.
  • Figure 5: Comparisons of test accuracy and test loss performances at each epoch of KD and SKD in $T$ = 5 on CIFAR10-DVS.