PEOD: A Pixel-Aligned Event-RGB Benchmark for Object Detection under Challenging Conditions
Luoping Cui, Hanqing Liu, Mingjie Liu, Endian Lin, Donghong Jiang, Yuhao Wang, Chuang Zhu
TL;DR
PEOD addresses the lack of high-resolution, densely annotated Event-RGB data under challenging conditions by introducing a pixel-aligned 1280x720 dataset with 130+ sequences and 340k bounding boxes across six traffic classes. It deploys a dual-camera coaxial system for precise spatiotemporal alignment and provides a unified benchmark evaluating 14 detectors across Event, RGB, and Event-RGB fusion modalities, including subset analyses for illumination challenges. Key findings show fusion detectors offer the best overall performance on the full dataset, while strong event-based detectors provide superior robustness under challenging illumination, revealing current fusion limitations when RGB frames are severely degraded. The dataset supports future work in image reconstruction and long-term tracking, and highlights the need for reliability-aware, deeply coupled fusion strategies to maximize the utility of event information in real-world conditions.
Abstract
Robust object detection for challenging scenarios increasingly relies on event cameras, yet existing Event-RGB datasets remain constrained by sparse coverage of extreme conditions and low spatial resolution (<= 640 x 480), which prevents comprehensive evaluation of detectors under challenging scenarios. To address these limitations, we propose PEOD, the first large-scale, pixel-aligned and high-resolution (1280 x 720) Event-RGB dataset for object detection under challenge conditions. PEOD contains 130+ spatiotemporal-aligned sequences and 340k manual bounding boxes, with 57% of data captured under low-light, overexposure, and high-speed motion. Furthermore, we benchmark 14 methods across three input configurations (Event-based, RGB-based, and Event-RGB fusion) on PEOD. On the full test set and normal subset, fusion-based models achieve the excellent performance. However, in illumination challenge subset, the top event-based model outperforms all fusion models, while fusion models still outperform their RGB-based counterparts, indicating limits of existing fusion methods when the frame modality is severely degraded. PEOD establishes a realistic, high-quality benchmark for multimodal perception and facilitates future research.
