Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network
Rui Zhang, Luziwei Leng, Kaiwei Che, Hu Zhang, Jie Cheng, Qinghai Guo, Jiangxing Liao, Ran Cheng
TL;DR
This work tackles the challenge of deploying efficient spiking neural networks for dense, event-based semantic segmentation. It introduces SpikingEDN, a spiking encoder–decoder that uses AiLIF-based adaptive threshold encoding in the first layer and a dual-path SSAM module to enhance sparse event representation while remaining compatible with multiply-free inference. An architecture-search strategy refines the encoder design, and the SSAM module enables effective fusion of event streams with grayscale inputs, achieving MIoU of 72.57% on DDD17 and 58.32% on DSEC-Semantic, with substantial reductions in energy consumption compared to ANN rivals. The results demonstrate the untapped potential of SNNs for high-level vision tasks on neuromorphic-friendly hardware, while providing a practical path toward energy-efficient edge deployments and public-release code for reproducibility.
Abstract
Spiking neural networks (SNNs), known for their low-power, event-driven computation and intrinsic temporal dynamics, are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors. Despite their potential, SNNs face challenges in training and architectural design, resulting in limited performance in challenging event-based dense prediction tasks compared to artificial neural networks (ANNs). In this work, we develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation tasks. To enhance the learning efficiency from dynamic event streams, we harness the adaptive threshold which improves network accuracy, sparsity and robustness in streaming inference. Moreover, we develop a dual-path Spiking Spatially-Adaptive Modulation module, which is specifically tailored to enhance the representation of sparse events and multi-modal inputs, thereby considerably improving network performance. Our SpikingEDN attains a mean intersection over union (MIoU) of 72.57\% on the DDD17 dataset and 58.32\% on the larger DSEC-Semantic dataset, showing competitive results to the state-of-the-art ANNs while requiring substantially fewer computational resources. Our results shed light on the untapped potential of SNNs in event-based vision applications. The source code will be made publicly available.
