Table of Contents
Fetching ...

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

Xiao Wang, Yu Jin, Wentao Wu, Wei Zhang, Lin Zhu, Bo Jiang, Yonghong Tian

TL;DR

This work tackles event-based object detection by introducing MvHeat-DET, a Mixture-of-Experts heat-conduction backbone that processes event streams through multiple frequency-domain transform experts with learnable thermal diffusivity. An IoU-based query selection module (IQS) guides efficient token extraction for the detection head. To support future research, the authors introduce EvDET200K, a high-definition 10-class event dataset with 200k bounding boxes across 10,054 samples, and provide extensive benchmarking against more than 15 state-of-the-art detectors. The results show that the MoE heat-conduction framework achieves strong accuracy with favorable efficiency, outperforming CNN, Transformer, and SNN baselines, and the component analyses validate the contributions of IQS, vHeat encoding, and MoE. Overall, the paper advances interpretable, efficient event-based detection and provides a comprehensive dataset for robust evaluation.

Abstract

Object detection in event streams has emerged as a cutting-edge research area, demonstrating superior performance in low-light conditions, scenarios with motion blur, and rapid movements. Current detectors leverage spiking neural networks, Transformers, or convolutional neural networks as their core architectures, each with its own set of limitations including restricted performance, high computational overhead, or limited local receptive fields. This paper introduces a novel MoE (Mixture of Experts) heat conduction-based object detection algorithm that strikingly balances accuracy and computational efficiency. Initially, we employ a stem network for event data embedding, followed by processing through our innovative MoE-HCO blocks. Each block integrates various expert modules to mimic heat conduction within event streams. Subsequently, an IoU-based query selection module is utilized for efficient token extraction, which is then channeled into a detection head for the final object detection process. Furthermore, we are pleased to introduce EvDET200K, a novel benchmark dataset for event-based object detection. Captured with a high-definition Prophesee EVK4-HD event camera, this dataset encompasses 10 distinct categories, 200,000 bounding boxes, and 10,054 samples, each spanning 2 to 5 seconds. We also provide comprehensive results from over 15 state-of-the-art detectors, offering a solid foundation for future research and comparison. The source code of this paper will be released on: https://github.com/Event-AHU/OpenEvDET

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

TL;DR

This work tackles event-based object detection by introducing MvHeat-DET, a Mixture-of-Experts heat-conduction backbone that processes event streams through multiple frequency-domain transform experts with learnable thermal diffusivity. An IoU-based query selection module (IQS) guides efficient token extraction for the detection head. To support future research, the authors introduce EvDET200K, a high-definition 10-class event dataset with 200k bounding boxes across 10,054 samples, and provide extensive benchmarking against more than 15 state-of-the-art detectors. The results show that the MoE heat-conduction framework achieves strong accuracy with favorable efficiency, outperforming CNN, Transformer, and SNN baselines, and the component analyses validate the contributions of IQS, vHeat encoding, and MoE. Overall, the paper advances interpretable, efficient event-based detection and provides a comprehensive dataset for robust evaluation.

Abstract

Object detection in event streams has emerged as a cutting-edge research area, demonstrating superior performance in low-light conditions, scenarios with motion blur, and rapid movements. Current detectors leverage spiking neural networks, Transformers, or convolutional neural networks as their core architectures, each with its own set of limitations including restricted performance, high computational overhead, or limited local receptive fields. This paper introduces a novel MoE (Mixture of Experts) heat conduction-based object detection algorithm that strikingly balances accuracy and computational efficiency. Initially, we employ a stem network for event data embedding, followed by processing through our innovative MoE-HCO blocks. Each block integrates various expert modules to mimic heat conduction within event streams. Subsequently, an IoU-based query selection module is utilized for efficient token extraction, which is then channeled into a detection head for the final object detection process. Furthermore, we are pleased to introduce EvDET200K, a novel benchmark dataset for event-based object detection. Captured with a high-definition Prophesee EVK4-HD event camera, this dataset encompasses 10 distinct categories, 200,000 bounding boxes, and 10,054 samples, each spanning 2 to 5 seconds. We also provide comprehensive results from over 15 state-of-the-art detectors, offering a solid foundation for future research and comparison. The source code of this paper will be released on: https://github.com/Event-AHU/OpenEvDET

Paper Structure

This paper contains 18 sections, 10 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: (a). Comparison of existing datasets and our proposed EvDET200K dataset for event stream based detection; (b). Comparison of our proposed MvHeat-DET and existing SOTA detectors on the EvDET200K dataset.
  • Figure 2: An overview of our proposed event-based object detection framework, termed MvHeat-DET.
  • Figure 3: Illustration of some representative samples of our proposed EvDET200K dataset.
  • Figure 4: Visualization of All Annotation Information in the Dataset. Top Left (Instance Count per Class) shows the number of instances for each class in the whole dataset; Top Right (Bounding Box Size Distribution) illustrates the distribution of bounding box sizes across the dataset; Bottom Left (Object Center Distribution) shows the relative position (x, y coordinates) of object centers within the images. Bottom Right (Aspect Ratio Distribution) displays the distribution of width-to-height ratios of objects in the dataset.
  • Figure 5: Visualization of the detection results of ours and other detectors. (MC: misclassification, UD: undetected, OD: over-detected, LD: large deviation.)
  • ...and 1 more figures