A Black-Box Evaluation Framework for Semantic Robustness in Bird's Eye View Detection

Fu Wang; Yanghao Zhang; Xiangyu Yin; Guangliang Cheng; Zeyu Fu; Xiaowei Huang; Wenjie Ruan

A Black-Box Evaluation Framework for Semantic Robustness in Bird's Eye View Detection

Fu Wang, Yanghao Zhang, Xiangyu Yin, Guangliang Cheng, Zeyu Fu, Xiaowei Huang, Wenjie Ruan

TL;DR

The paper tackles the problem of evaluating worst-case robustness of camera-based BEV detectors under semantic perturbations in a black-box setting. It introduces a distance-based surrogate objective that aligns with BEV box matching and a deterministic global optimizer, SimpleDIRECT, which uses a simplified node-selection strategy to efficiently locate adversarial perturbations. Through extensive experiments on nuScenes across ten BEV models, the framework demonstrates superior ability to reveal vulnerabilities compared to random perturbations and baseline optimizers, with PolarFormer showing the strongest robustness and BEVDet being highly susceptible. A full validation-set case study confirms the framework’s practical utility and highlights how temporal information can influence robustness, underscoring the need for robust BEV designs in real-world autonomous driving systems.

Abstract

Camera-based Bird's Eye View (BEV) perception models receive increasing attention for their crucial role in autonomous driving, a domain where concerns about the robustness and reliability of deep learning have been raised. While only a few works have investigated the effects of randomly generated semantic perturbations, aka natural corruptions, on the multi-view BEV detection task, we develop a black-box robustness evaluation framework that adversarially optimises three common semantic perturbations: geometric transformation, colour shifting, and motion blur, to deceive BEV models, serving as the first approach in this emerging field. To address the challenge posed by optimising the semantic perturbation, we design a smoothed, distance-based surrogate function to replace the mAP metric and introduce SimpleDIRECT, a deterministic optimisation algorithm that utilises observed slopes to guide the optimisation process. By comparing with randomised perturbation and two optimisation baselines, we demonstrate the effectiveness of the proposed framework. Additionally, we provide a benchmark on the semantic robustness of ten recent BEV models. The results reveal that PolarFormer, which emphasises geometric information from multi-view images, exhibits the highest robustness, whereas BEVDet is fully compromised, with its precision reduced to zero.

A Black-Box Evaluation Framework for Semantic Robustness in Bird's Eye View Detection

TL;DR

Abstract

A Black-Box Evaluation Framework for Semantic Robustness in Bird's Eye View Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)