Table of Contents
Fetching ...

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions

Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

TL;DR

RoboBEV introduces nuScenes-C, a comprehensive natural-corruption benchmark for camera-only BEV perception, sweeping eight corruption types with three severity levels and evaluating 26 BEV detectors. The study reveals that while absolute performance on clean data generally trends with performance under corruption, robustness is highly model-dependent and not guaranteed by strong clean performance alone. Key findings highlight the promise of depth-free BEV transformations, pre-training, and longer temporal fusion for improving robustness, and show that multi-modality fusion can mitigate camera degradation. These insights offer concrete guidance for designing BEV models that maintain accuracy and reliability in real-world, corrupted conditions.

Abstract

The recent advances in camera-based bird's eye view (BEV) representation exhibit great potential for in-vehicle 3D perception. Despite the substantial progress achieved on standard benchmarks, the robustness of BEV algorithms has not been thoroughly examined, which is critical for safe operations. To bridge this gap, we introduce RoboBEV, a comprehensive benchmark suite that encompasses eight distinct corruptions, including Bright, Dark, Fog, Snow, Motion Blur, Color Quant, Camera Crash, and Frame Lost. Based on it, we undertake extensive evaluations across a wide range of BEV-based models to understand their resilience and reliability. Our findings indicate a strong correlation between absolute performance on in-distribution and out-of-distribution datasets. Nonetheless, there are considerable variations in relative performance across different approaches. Our experiments further demonstrate that pre-training and depth-free BEV transformation has the potential to enhance out-of-distribution robustness. Additionally, utilizing long and rich temporal information largely helps with robustness. Our findings provide valuable insights for designing future BEV models that can achieve both accuracy and robustness in real-world deployments.

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions

TL;DR

RoboBEV introduces nuScenes-C, a comprehensive natural-corruption benchmark for camera-only BEV perception, sweeping eight corruption types with three severity levels and evaluating 26 BEV detectors. The study reveals that while absolute performance on clean data generally trends with performance under corruption, robustness is highly model-dependent and not guaranteed by strong clean performance alone. Key findings highlight the promise of depth-free BEV transformations, pre-training, and longer temporal fusion for improving robustness, and show that multi-modality fusion can mitigate camera degradation. These insights offer concrete guidance for designing BEV models that maintain accuracy and reliability in real-world, corrupted conditions.

Abstract

The recent advances in camera-based bird's eye view (BEV) representation exhibit great potential for in-vehicle 3D perception. Despite the substantial progress achieved on standard benchmarks, the robustness of BEV algorithms has not been thoroughly examined, which is critical for safe operations. To bridge this gap, we introduce RoboBEV, a comprehensive benchmark suite that encompasses eight distinct corruptions, including Bright, Dark, Fog, Snow, Motion Blur, Color Quant, Camera Crash, and Frame Lost. Based on it, we undertake extensive evaluations across a wide range of BEV-based models to understand their resilience and reliability. Our findings indicate a strong correlation between absolute performance on in-distribution and out-of-distribution datasets. Nonetheless, there are considerable variations in relative performance across different approaches. Our experiments further demonstrate that pre-training and depth-free BEV transformation has the potential to enhance out-of-distribution robustness. Additionally, utilizing long and rich temporal information largely helps with robustness. Our findings provide valuable insights for designing future BEV models that can achieve both accuracy and robustness in real-world deployments.
Paper Structure (32 sections, 3 equations, 18 figures, 33 tables)

This paper contains 32 sections, 3 equations, 18 figures, 33 tables.

Figures (18)

  • Figure 1: The radar charts of existing BEV detectors' nuScenes Detection Score (NDS) caesar2020nuscenes under eight corruption types. We observe diverse behaviors of different models even with competitive "clean" performance. The NDS is normalized across all the benchmarking BEV models to lie between 0.1 and 1.
  • Figure 2: Corruption examples from the RoboBEV benchmark. Left: Corruption taxonomy. Right: Temporal corruptions. Camera Crash drop fixed set of images along timestamps; Frame Lost randomly drop frames along timestamps. More examples are in Appendix \ref{['sec:app-more-visual']}.
  • Figure 3: The performance on nuScenes-C is improved as the performance on the "clean" nuScenes caesar2020nuscenes dataset. The relation of absolute performance is close to linear. However, when considering the relative performance, the mRR metric is more randomly distributed without a clear trend to increase.
  • Figure 4: Depth estimation error vs. Resilience Rate. We observe strong correlations where large depth estimation errors under Snow and Dark tend to cause drastic performance drops.
  • Figure 5: Depth estimation results of BEVDepth li2022bevdepth under different corruptions. The results exhibit a different sensitivity for each type.
  • ...and 13 more figures