Table of Contents
Fetching ...

DSERT-RoLL: Robust Multi-Modal Perception for Diverse Driving Conditions with Stereo Event-RGB-Thermal Cameras, 4D Radar, and Dual-LiDAR

Hoonhee Cho, Jae-Young Kang, Yuhwan Jeong, Yunseo Yang, Wonyoung Lee, Youngho Kim, Kuk-Jin Yoon

Abstract

In this paper, we present DSERT-RoLL, a driving dataset that incorporates stereo event, RGB, and thermal cameras together with 4D radar and dual LiDAR, collected across diverse weather and illumination conditions. The dataset provides precise 2D and 3D bounding boxes with track IDs and ego vehicle odometry, enabling fair comparisons within and across sensor combinations. It is designed to alleviate data scarcity for novel sensors such as event cameras and 4D radar and to support systematic studies of their behavior. We establish unified 3D and 2D benchmarks that enable direct comparison of characteristics and strengths across sensor families and within each family. We report baselines for representative single modality and multimodal methods and provide protocols that encourage research on different fusion strategies and sensor combinations. In addition, we propose a fusion framework that integrates sensor specific cues into a unified feature space and improves 3D detection robustness under varied weather and lighting.

DSERT-RoLL: Robust Multi-Modal Perception for Diverse Driving Conditions with Stereo Event-RGB-Thermal Cameras, 4D Radar, and Dual-LiDAR

Abstract

In this paper, we present DSERT-RoLL, a driving dataset that incorporates stereo event, RGB, and thermal cameras together with 4D radar and dual LiDAR, collected across diverse weather and illumination conditions. The dataset provides precise 2D and 3D bounding boxes with track IDs and ego vehicle odometry, enabling fair comparisons within and across sensor combinations. It is designed to alleviate data scarcity for novel sensors such as event cameras and 4D radar and to support systematic studies of their behavior. We establish unified 3D and 2D benchmarks that enable direct comparison of characteristics and strengths across sensor families and within each family. We report baselines for representative single modality and multimodal methods and provide protocols that encourage research on different fusion strategies and sensor combinations. In addition, we propose a fusion framework that integrates sensor specific cues into a unified feature space and improves 3D detection robustness under varied weather and lighting.

Paper Structure

This paper contains 35 sections, 8 equations, 19 figures, 12 tables.

Figures (19)

  • Figure 1: The proposed DSERT-RoLL dataset comprises stereo event, RGB, and thermal cameras, together with 4D radar and dual LiDAR, collected in on-road driving across a wide range of weather and illumination conditions, and provided with precise 3D annotations.
  • Figure 2: Complementary scenarios across sensor families. (a–b) 3D range sensors: (a) LiDAR-dominant, effective in clear conditions and at long range with accurate geometry; (b) 4D radar-dominant, reliable in adverse weather (e.g., fog and snow) using Doppler. (c–e) Camera-based sensors: (c) RGB-dominant, strong in daylight and textured scenes; (d) Event-dominant, responsive to small and rapid motions and robust in high dynamic range; (e) Thermal-dominant, informative at night or in low light. Together, (a)–(e) illustrate complementary strengths across sensor types.
  • Figure 3: Distribution of training and testing data with respect to weather conditions, lighting conditions, and object classes.
  • Figure 4: Overview of the proposed multi-modal 3D detection framework. LiDAR and 4D Radar features are voxelized and fused to generate initial 3D box proposals. RGB, thermal, and event features are then projected into 3D space via voxel-centric sampling and integrated through confidence-based fusion. The refined fused features are used for final bounding box prediction.
  • Figure 5: Class distribution across distance bins for the Bike, Pedestrian, and Vehicle categories. Panel a) shows the distribution in the training set, and panel b) shows the distribution in the test set.
  • ...and 14 more figures