Table of Contents
Fetching ...

Toward Deep Representation Learning for Event-Enhanced Visual Autonomous Perception: the eAP Dataset

Jinghang Li, Shichao Li, Qing Lian, Peiliang Li, Xiaozhi Chen, Yi Zhou

Abstract

Recent visual autonomous perception systems achieve remarkable performances with deep representation learning. However, they fail in scenarios with challenging illumination.While event cameras can mitigate this problem, there is a lack of a large-scale dataset to develop event-enhanced deep visual perception models in autonomous driving scenes. To address the gap, we present the eAP (event-enhanced Autonomous Perception) dataset, the largest dataset with event cameras for autonomous perception. We demonstrate how eAP can facilitate the study of different autonomous perception tasks, including 3D vehicle detection and object time-to-contact (TTC) estimation, through deep representation learning. Based on eAP, we demonstrate the ffrst successful use of events to improve a popular 3D vehicle detection network in challenging illumination scenarios. eAP also enables a devoted study of the representation learning problem of object TTC estimation. We show how a geometryaware representation learning framework leads to the best eventbased object TTC estimation network that operates at 200 FPS. The dataset, code, and pre-trained models will be made publicly available for future research.

Toward Deep Representation Learning for Event-Enhanced Visual Autonomous Perception: the eAP Dataset

Abstract

Recent visual autonomous perception systems achieve remarkable performances with deep representation learning. However, they fail in scenarios with challenging illumination.While event cameras can mitigate this problem, there is a lack of a large-scale dataset to develop event-enhanced deep visual perception models in autonomous driving scenes. To address the gap, we present the eAP (event-enhanced Autonomous Perception) dataset, the largest dataset with event cameras for autonomous perception. We demonstrate how eAP can facilitate the study of different autonomous perception tasks, including 3D vehicle detection and object time-to-contact (TTC) estimation, through deep representation learning. Based on eAP, we demonstrate the ffrst successful use of events to improve a popular 3D vehicle detection network in challenging illumination scenarios. eAP also enables a devoted study of the representation learning problem of object TTC estimation. We show how a geometryaware representation learning framework leads to the best eventbased object TTC estimation network that operates at 200 FPS. The dataset, code, and pre-trained models will be made publicly available for future research.
Paper Structure (28 sections, 12 equations, 13 figures, 9 tables)

This paper contains 28 sections, 12 equations, 13 figures, 9 tables.

Figures (13)

  • Figure -1: Sensor configuration for the eAP dataset. The event camera and RGB camera are rigidly mounted with a narrow baseline of 3 cm, as highlighted in the zoomed-in image. The positions and coordinate systems of the sensors are depicted in the bird's eye view.
  • Figure 0: Projected point cloud on the RGB image after using Calib-Anything luo2023calib for calibrating the extrinsic parameters between the LiDAR and the RGB camera. Different colors represent the distinct clusters identified during the clustering process in luo2023calib. The ground points are omitted for clearer visualization.
  • Figure 2: (Top) Bird's eye view (BEV) visualization of the LiDAR point cloud with annotated 3D bounding boxes demonstrating the 3D annotation precision; (Bottom) Plot of velocity curves of exemplar object trajectories.
  • Figure 3: Exemplar frames, event data, and object TTC annotations from the eAP dataset under varying weather, location, time, and illumination conditions. The 2D bounding boxes show projections of the annotated 3D boxes, along with categories and ground truth TTC values. Event data demonstrates superior quality in over-exposed and low-light scenes, highlighting the benefits of the event modality. Bounding boxes of objects with high visibility are visualized, and a complete visualization of all boxes is included in the supplementary video.
  • Figure 4: Visualization of additional ground-truth annotation types planned for future releases of the eAP dataset, including (A) LiDAR point clouds, (B) depth maps, (C) optical flow, (D) 2D annotations, (E) dense TTC maps, and (F) segmentation maps.
  • ...and 8 more figures