RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions
Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu
TL;DR
RoboBEV introduces nuScenes-C, a comprehensive natural-corruption benchmark for camera-only BEV perception, sweeping eight corruption types with three severity levels and evaluating 26 BEV detectors. The study reveals that while absolute performance on clean data generally trends with performance under corruption, robustness is highly model-dependent and not guaranteed by strong clean performance alone. Key findings highlight the promise of depth-free BEV transformations, pre-training, and longer temporal fusion for improving robustness, and show that multi-modality fusion can mitigate camera degradation. These insights offer concrete guidance for designing BEV models that maintain accuracy and reliability in real-world, corrupted conditions.
Abstract
The recent advances in camera-based bird's eye view (BEV) representation exhibit great potential for in-vehicle 3D perception. Despite the substantial progress achieved on standard benchmarks, the robustness of BEV algorithms has not been thoroughly examined, which is critical for safe operations. To bridge this gap, we introduce RoboBEV, a comprehensive benchmark suite that encompasses eight distinct corruptions, including Bright, Dark, Fog, Snow, Motion Blur, Color Quant, Camera Crash, and Frame Lost. Based on it, we undertake extensive evaluations across a wide range of BEV-based models to understand their resilience and reliability. Our findings indicate a strong correlation between absolute performance on in-distribution and out-of-distribution datasets. Nonetheless, there are considerable variations in relative performance across different approaches. Our experiments further demonstrate that pre-training and depth-free BEV transformation has the potential to enhance out-of-distribution robustness. Additionally, utilizing long and rich temporal information largely helps with robustness. Our findings provide valuable insights for designing future BEV models that can achieve both accuracy and robustness in real-world deployments.
