Table of Contents
Fetching ...

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang, Zeliang Ma, Dengyi Ji, Haiwen Li, Xingliang Huang, Yu Tian, Genghua Kou, Fan Jia, Yingfei Liu, Tiancai Wang, Ying Li, Xiaoshuai Hao, Yifan Yang, Hui Zhang, Mengchuan Wei, Yi Zhou, Haimei Zhao, Jing Zhang, Jinke Li, Xiao He, Xiaoqiang Cheng, Bingyang Zhang, Lirong Zhao, Dianlei Ding, Fangsheng Liu, Yixiang Yan, Hongming Wang, Nanfei Ye, Lun Luo, Yubo Tian, Yiwei Zuo, Zhe Cao, Yi Ren, Yunfan Li, Wenjie Liu, Xun Wu, Yifan Mao, Ming Li, Jian Liu, Jiayang Liu, Zihan Qin, Cunxi Chu, Jialei Xu, Wenbo Zhao, Junjun Jiang, Xianming Liu, Ziyan Wang, Chiwei Li, Shilong Li, Chendong Yuan, Songyue Yang, Wentao Liu, Peng Chen, Bin Zhou, Yubo Wang, Chi Zhang, Jianhang Sun, Hai Chen, Xiao Yang, Lizhong Wang, Dongyi Fu, Yongchun Lin, Huitong Yang, Haoang Li, Yadan Luo, Xianjing Cheng, Yong Xu

TL;DR

The paper presents RoboDrive 2024, a comprehensive benchmark and competition framework for driving perception under out-of-distribution conditions across five tracks: robust BEV detection, robust HD map segmentation, robust semantic occupancy prediction, robust multi-view depth estimation, and robust multi-modal BEV detection. It documents a large-scale evaluation with 140 teams, nearly 1000 submissions, and 18 camera corruptions plus 3 LiDAR failure scenarios, driving significant advances via advanced data augmentation, multi-sensor fusion, and self-supervised learning for sensor-error correction. Top solutions across tracks exhibit notable robustness gains over baselines through temporal fusion, robust backbones, cross-modal fusion, and novel encoding schemes (e.g., PSC, CMT) that better handle corrupted data and sensor outages. The results establish a new benchmark for robust driving perception, highlight practical techniques for real-world deployment, and outline future directions in sensor integration, learning paradigms, standardization, and safety considerations to guide subsequent research and development.

Abstract

In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

TL;DR

The paper presents RoboDrive 2024, a comprehensive benchmark and competition framework for driving perception under out-of-distribution conditions across five tracks: robust BEV detection, robust HD map segmentation, robust semantic occupancy prediction, robust multi-view depth estimation, and robust multi-modal BEV detection. It documents a large-scale evaluation with 140 teams, nearly 1000 submissions, and 18 camera corruptions plus 3 LiDAR failure scenarios, driving significant advances via advanced data augmentation, multi-sensor fusion, and self-supervised learning for sensor-error correction. Top solutions across tracks exhibit notable robustness gains over baselines through temporal fusion, robust backbones, cross-modal fusion, and novel encoding schemes (e.g., PSC, CMT) that better handle corrupted data and sensor outages. The results establish a new benchmark for robust driving perception, highlight practical techniques for real-world deployment, and outline future directions in sensor integration, learning paradigms, standardization, and safety considerations to guide subsequent research and development.

Abstract

In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.
Paper Structure (72 sections, 5 equations, 27 figures, 5 tables)

This paper contains 72 sections, 5 equations, 27 figures, 5 tables.

Figures (27)

  • Figure 1: Challenge overview. The 2024 RoboDrive challenge aims to facilitate and encourage innovative solutions for tackling mainstream driving perception tasks under out-of-distribution (OoD) scenarios that occur in the real world. We are particularly interested in enhancing the OoD robustness of 3D object detection, HD map segmentation, semantic occupancy prediction, and multi-view depth estimation algorithms in challenging and unprecedented conditions, such as camera corruptions, camera failures, and LiDAR failures.
  • Figure 2: The TSMA-BEV framework: Multi-view images undergo sequence-consistent augmentations, are processed through an image encoder for 2D feature extraction, transformed into 3D space by a view transformer, and finally, detection is performed using concatenated temporal features for enhanced accuracy.
  • Figure 3: Examples of AugFFT-generated images, demonstrating the variety of frequency domain adjustments used to simulate and prepare for diverse environmental conditions, enhancing the model's generalization capabilities.
  • Figure 4: Pipeline of the Multi-View Enhancer (MVE) method developed by Team Ponyville, illustrating the integration of advanced computational techniques from data augmentation through to model training and object detection.
  • Figure 5: Examples of Augmix-enhanced data hendrycks2019augmix by Team Ponyville.
  • ...and 22 more figures