Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng, Hebei Li, Yueyi Zhang, Xiaoyan Sun, Feng Wu
TL;DR
This work tackles the high computational cost of Transformer-based event detection by introducing SAST, a scene-adaptive sparse Transformer that performs window-token co-sparsification and dynamic sparsity optimization. It combines a scoring module, a selection module, and Masked Sparse Window Self-Attention to enable efficient, scene-aware processing of sparse event streams. Empirical results on 1Mpx and Gen1 show SAST achieves state-of-the-art mAP with significantly reduced A-FLOPs and runtime, outperforming both dense and prior sparse networks, with further gains from the SAST-CB variant. The approach offers practical benefits for real-time, energy-efficient event-based detection across varying scenes and resolutions.
Abstract
While recent Transformer-based approaches have shown impressive performances on event-based object detection tasks, their high computational costs still diminish the low power consumption advantage of event cameras. Image-based works attempt to reduce these costs by introducing sparse Transformers. However, they display inadequate sparsity and adaptability when applied to event-based object detection, since these approaches cannot balance the fine granularity of token-level sparsification and the efficiency of window-based Transformers, leading to reduced performance and efficiency. Furthermore, they lack scene-specific sparsity optimization, resulting in information loss and a lower recall rate. To overcome these limitations, we propose the Scene Adaptive Sparse Transformer (SAST). SAST enables window-token co-sparsification, significantly enhancing fault tolerance and reducing computational overhead. Leveraging the innovative scoring and selection modules, along with the Masked Sparse Window Self-Attention, SAST showcases remarkable scene-aware adaptability: It focuses only on important objects and dynamically optimizes sparsity level according to scene complexity, maintaining a remarkable balance between performance and computational cost. The evaluation results show that SAST outperforms all other dense and sparse networks in both performance and efficiency on two large-scale event-based object detection datasets (1Mpx and Gen1). Code: https://github.com/Peterande/SAST
