Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
Xinhao Luo, Man Yao, Yuhong Chou, Bo Xu, Guoqi Li
TL;DR
This work tackles the limited performance and high power of spiking neural networks (SNNs) in complex vision tasks like object detection. It introduces SpikeYOLO, a hybrid architecture that preserves the macro YOLO design while embedding Meta-SpikeFormer-inspired meta SNN blocks, and pairs it with the Integer Leaky Integrate-and-Fire (I-LIF) neuron that trains with integer activations and performs spike-driven inference by extending timesteps. The approach yields substantial gains on COCO ($66.2\%$ mAP@50, $48.9\%$ mAP@50:95) and strong neuromorphic results on Gen1 ($67.2\%$ mAP@50, with up to $5.7\times$ energy efficiency over ANN baselines), demonstrating that carefully designed SNN architectures and training strategies can approach ANN performance in complex object detection. Overall, SpikeYOLO showcases a viable path for energy-efficient neuromorphic object detection by balancing architectural simplification, re-parameterization, and quantization-aware training via I-LIF.
Abstract
Brain-inspired Spiking Neural Networks (SNNs) have bio-plausibility and low-power advantages over Artificial Neural Networks (ANNs). Applications of SNNs are currently limited to simple classification tasks because of their poor performance. In this work, we focus on bridging the performance gap between ANNs and SNNs on object detection. Our design revolves around network architecture and spiking neuron. First, the overly complex module design causes spike degradation when the YOLO series is converted to the corresponding spiking version. We design a SpikeYOLO architecture to solve this problem by simplifying the vanilla YOLO and incorporating meta SNN blocks. Second, object detection is more sensitive to quantization errors in the conversion of membrane potentials into binary spikes by spiking neurons. To address this challenge, we design a new spiking neuron that activates Integer values during training while maintaining spike-driven by extending virtual timesteps during inference. The proposed method is validated on both static and neuromorphic object detection datasets. On the static COCO dataset, we obtain 66.2% mAP@50 and 48.9% mAP@50:95, which is +15.0% and +18.7% higher than the prior state-of-the-art SNN, respectively. On the neuromorphic Gen1 dataset, we achieve 67.2% mAP@50, which is +2.5% greater than the ANN with equivalent architecture, and the energy efficiency is improved by 5.7*. Code: https://github.com/BICLab/SpikeYOLO
