Table of Contents
Fetching ...

PEOD: A Pixel-Aligned Event-RGB Benchmark for Object Detection under Challenging Conditions

Luoping Cui, Hanqing Liu, Mingjie Liu, Endian Lin, Donghong Jiang, Yuhao Wang, Chuang Zhu

TL;DR

PEOD addresses the lack of high-resolution, densely annotated Event-RGB data under challenging conditions by introducing a pixel-aligned 1280x720 dataset with 130+ sequences and 340k bounding boxes across six traffic classes. It deploys a dual-camera coaxial system for precise spatiotemporal alignment and provides a unified benchmark evaluating 14 detectors across Event, RGB, and Event-RGB fusion modalities, including subset analyses for illumination challenges. Key findings show fusion detectors offer the best overall performance on the full dataset, while strong event-based detectors provide superior robustness under challenging illumination, revealing current fusion limitations when RGB frames are severely degraded. The dataset supports future work in image reconstruction and long-term tracking, and highlights the need for reliability-aware, deeply coupled fusion strategies to maximize the utility of event information in real-world conditions.

Abstract

Robust object detection for challenging scenarios increasingly relies on event cameras, yet existing Event-RGB datasets remain constrained by sparse coverage of extreme conditions and low spatial resolution (<= 640 x 480), which prevents comprehensive evaluation of detectors under challenging scenarios. To address these limitations, we propose PEOD, the first large-scale, pixel-aligned and high-resolution (1280 x 720) Event-RGB dataset for object detection under challenge conditions. PEOD contains 130+ spatiotemporal-aligned sequences and 340k manual bounding boxes, with 57% of data captured under low-light, overexposure, and high-speed motion. Furthermore, we benchmark 14 methods across three input configurations (Event-based, RGB-based, and Event-RGB fusion) on PEOD. On the full test set and normal subset, fusion-based models achieve the excellent performance. However, in illumination challenge subset, the top event-based model outperforms all fusion models, while fusion models still outperform their RGB-based counterparts, indicating limits of existing fusion methods when the frame modality is severely degraded. PEOD establishes a realistic, high-quality benchmark for multimodal perception and facilitates future research.

PEOD: A Pixel-Aligned Event-RGB Benchmark for Object Detection under Challenging Conditions

TL;DR

PEOD addresses the lack of high-resolution, densely annotated Event-RGB data under challenging conditions by introducing a pixel-aligned 1280x720 dataset with 130+ sequences and 340k bounding boxes across six traffic classes. It deploys a dual-camera coaxial system for precise spatiotemporal alignment and provides a unified benchmark evaluating 14 detectors across Event, RGB, and Event-RGB fusion modalities, including subset analyses for illumination challenges. Key findings show fusion detectors offer the best overall performance on the full dataset, while strong event-based detectors provide superior robustness under challenging illumination, revealing current fusion limitations when RGB frames are severely degraded. The dataset supports future work in image reconstruction and long-term tracking, and highlights the need for reliability-aware, deeply coupled fusion strategies to maximize the utility of event information in real-world conditions.

Abstract

Robust object detection for challenging scenarios increasingly relies on event cameras, yet existing Event-RGB datasets remain constrained by sparse coverage of extreme conditions and low spatial resolution (<= 640 x 480), which prevents comprehensive evaluation of detectors under challenging scenarios. To address these limitations, we propose PEOD, the first large-scale, pixel-aligned and high-resolution (1280 x 720) Event-RGB dataset for object detection under challenge conditions. PEOD contains 130+ spatiotemporal-aligned sequences and 340k manual bounding boxes, with 57% of data captured under low-light, overexposure, and high-speed motion. Furthermore, we benchmark 14 methods across three input configurations (Event-based, RGB-based, and Event-RGB fusion) on PEOD. On the full test set and normal subset, fusion-based models achieve the excellent performance. However, in illumination challenge subset, the top event-based model outperforms all fusion models, while fusion models still outperform their RGB-based counterparts, indicating limits of existing fusion methods when the frame modality is severely degraded. PEOD establishes a realistic, high-quality benchmark for multimodal perception and facilitates future research.

Paper Structure

This paper contains 20 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: PEOD examples under diverse challenging conditions. Overexposure (rows 1–3), motion blur (row 2), and low-light (row 4). Each row presents the event stream (left) with its pixel-aligned RGB frame (right).
  • Figure 2: Overview of PEOD dataset and acquisition system. (a) The coaxial imaging system used to capture spatiotemporally aligned Event and RGB data. (b) Temporal distribution of the dataset, with 57.1% captured under challenging illumination conditions. (c) Sample aligned Event-RGB pairs from diverse driving scenarios.
  • Figure 3: Representative visualization results on our PEOD dataset. (a) Traffic intersection in normal scenario. (b) Rushing cars in overexposure scenario. (c) Traffic intersection in low-light scenario. (d) High-speed moving two-wheelers with motion blur. While the RGB-based detector (YOLOv8) effectively utilizes rich textures in the normal scene (a), fusion detectors (CAFR, EOLO) and the event-based detector show decisive advantages by leveraging event data in the challenging overexposure (b), low-light (c), and motion-blur (d) conditions where the RGB-based detector fails.