Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset
Xiao Wang, Yu Jin, Wentao Wu, Wei Zhang, Lin Zhu, Bo Jiang, Yonghong Tian
TL;DR
This work tackles event-based object detection by introducing MvHeat-DET, a Mixture-of-Experts heat-conduction backbone that processes event streams through multiple frequency-domain transform experts with learnable thermal diffusivity. An IoU-based query selection module (IQS) guides efficient token extraction for the detection head. To support future research, the authors introduce EvDET200K, a high-definition 10-class event dataset with 200k bounding boxes across 10,054 samples, and provide extensive benchmarking against more than 15 state-of-the-art detectors. The results show that the MoE heat-conduction framework achieves strong accuracy with favorable efficiency, outperforming CNN, Transformer, and SNN baselines, and the component analyses validate the contributions of IQS, vHeat encoding, and MoE. Overall, the paper advances interpretable, efficient event-based detection and provides a comprehensive dataset for robust evaluation.
Abstract
Object detection in event streams has emerged as a cutting-edge research area, demonstrating superior performance in low-light conditions, scenarios with motion blur, and rapid movements. Current detectors leverage spiking neural networks, Transformers, or convolutional neural networks as their core architectures, each with its own set of limitations including restricted performance, high computational overhead, or limited local receptive fields. This paper introduces a novel MoE (Mixture of Experts) heat conduction-based object detection algorithm that strikingly balances accuracy and computational efficiency. Initially, we employ a stem network for event data embedding, followed by processing through our innovative MoE-HCO blocks. Each block integrates various expert modules to mimic heat conduction within event streams. Subsequently, an IoU-based query selection module is utilized for efficient token extraction, which is then channeled into a detection head for the final object detection process. Furthermore, we are pleased to introduce EvDET200K, a novel benchmark dataset for event-based object detection. Captured with a high-definition Prophesee EVK4-HD event camera, this dataset encompasses 10 distinct categories, 200,000 bounding boxes, and 10,054 samples, each spanning 2 to 5 seconds. We also provide comprehensive results from over 15 state-of-the-art detectors, offering a solid foundation for future research and comparison. The source code of this paper will be released on: https://github.com/Event-AHU/OpenEvDET
