Table of Contents
Fetching ...

Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection

Xinhao Luo, Man Yao, Yuhong Chou, Bo Xu, Guoqi Li

TL;DR

This work tackles the limited performance and high power of spiking neural networks (SNNs) in complex vision tasks like object detection. It introduces SpikeYOLO, a hybrid architecture that preserves the macro YOLO design while embedding Meta-SpikeFormer-inspired meta SNN blocks, and pairs it with the Integer Leaky Integrate-and-Fire (I-LIF) neuron that trains with integer activations and performs spike-driven inference by extending timesteps. The approach yields substantial gains on COCO ($66.2\%$ mAP@50, $48.9\%$ mAP@50:95) and strong neuromorphic results on Gen1 ($67.2\%$ mAP@50, with up to $5.7\times$ energy efficiency over ANN baselines), demonstrating that carefully designed SNN architectures and training strategies can approach ANN performance in complex object detection. Overall, SpikeYOLO showcases a viable path for energy-efficient neuromorphic object detection by balancing architectural simplification, re-parameterization, and quantization-aware training via I-LIF.

Abstract

Brain-inspired Spiking Neural Networks (SNNs) have bio-plausibility and low-power advantages over Artificial Neural Networks (ANNs). Applications of SNNs are currently limited to simple classification tasks because of their poor performance. In this work, we focus on bridging the performance gap between ANNs and SNNs on object detection. Our design revolves around network architecture and spiking neuron. First, the overly complex module design causes spike degradation when the YOLO series is converted to the corresponding spiking version. We design a SpikeYOLO architecture to solve this problem by simplifying the vanilla YOLO and incorporating meta SNN blocks. Second, object detection is more sensitive to quantization errors in the conversion of membrane potentials into binary spikes by spiking neurons. To address this challenge, we design a new spiking neuron that activates Integer values during training while maintaining spike-driven by extending virtual timesteps during inference. The proposed method is validated on both static and neuromorphic object detection datasets. On the static COCO dataset, we obtain 66.2% mAP@50 and 48.9% mAP@50:95, which is +15.0% and +18.7% higher than the prior state-of-the-art SNN, respectively. On the neuromorphic Gen1 dataset, we achieve 67.2% mAP@50, which is +2.5% greater than the ANN with equivalent architecture, and the energy efficiency is improved by 5.7*. Code: https://github.com/BICLab/SpikeYOLO

Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection

TL;DR

This work tackles the limited performance and high power of spiking neural networks (SNNs) in complex vision tasks like object detection. It introduces SpikeYOLO, a hybrid architecture that preserves the macro YOLO design while embedding Meta-SpikeFormer-inspired meta SNN blocks, and pairs it with the Integer Leaky Integrate-and-Fire (I-LIF) neuron that trains with integer activations and performs spike-driven inference by extending timesteps. The approach yields substantial gains on COCO ( mAP@50, mAP@50:95) and strong neuromorphic results on Gen1 ( mAP@50, with up to energy efficiency over ANN baselines), demonstrating that carefully designed SNN architectures and training strategies can approach ANN performance in complex object detection. Overall, SpikeYOLO showcases a viable path for energy-efficient neuromorphic object detection by balancing architectural simplification, re-parameterization, and quantization-aware training via I-LIF.

Abstract

Brain-inspired Spiking Neural Networks (SNNs) have bio-plausibility and low-power advantages over Artificial Neural Networks (ANNs). Applications of SNNs are currently limited to simple classification tasks because of their poor performance. In this work, we focus on bridging the performance gap between ANNs and SNNs on object detection. Our design revolves around network architecture and spiking neuron. First, the overly complex module design causes spike degradation when the YOLO series is converted to the corresponding spiking version. We design a SpikeYOLO architecture to solve this problem by simplifying the vanilla YOLO and incorporating meta SNN blocks. Second, object detection is more sensitive to quantization errors in the conversion of membrane potentials into binary spikes by spiking neurons. To address this challenge, we design a new spiking neuron that activates Integer values during training while maintaining spike-driven by extending virtual timesteps during inference. The proposed method is validated on both static and neuromorphic object detection datasets. On the static COCO dataset, we obtain 66.2% mAP@50 and 48.9% mAP@50:95, which is +15.0% and +18.7% higher than the prior state-of-the-art SNN, respectively. On the neuromorphic Gen1 dataset, we achieve 67.2% mAP@50, which is +2.5% greater than the ANN with equivalent architecture, and the energy efficiency is improved by 5.7*. Code: https://github.com/BICLab/SpikeYOLO
Paper Structure (16 sections, 17 equations, 4 figures, 5 tables)

This paper contains 16 sections, 17 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The overall architecture of SpikeYOLO. We designed two SNN blocks, SNN-Block-1 and SNN-Block-2, and kept other architectures remain as YOLOv8. SNN-Block-1 employs standard convolution within its $\rm{ ChannelConv\left(\cdot\right)}$ component, whereas SNN-Block-2 utilizes re-parameterization convolution. That is, the difference between the two is the channel mixer module. In the low and high stages, we use SNN-Block-1 and SNN-Block-2, respectively. The spiking neuron is I-LIF, which activates integer values during training while converting them to binary spikes during inference.
  • Figure 2: Comparison of I-LIF and LIF. Binary spikes are emitted by LIF during both training and inference processes, which results in quantization errors. I-LIF emits integer values during the training process to reduce quantization errors, and converts them into binary spikes during inference to make the network only perform sparse addition.
  • Figure 3: An example of how the proposed I-LIF works. We assume $T=3$,$D=2$, and show the corresponding binary spike sequences of integer value during inference. The membrane potential in $\left[0.5,1.5\right)$ are quantized to 1, while those in $\left[1.5,2.5\right)$ are quantized to 2. membrane potential that $>2.5$ are also quantized to 2 due to the maximum integer value $D=2$. Subsequently, the membrane potential will be subtracted from the integer value. The training spike with a value of 2 will be converted into two binary spikes by extending virtual timesteps during inference.
  • Figure 4: The object detection results on the COCO dataset. The first two columns compare the effect of maximum integer value $D$ on performance for the same structure. The second and third columns compare the effect of the size of the model on performance.