Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection

Alexis Winter; Jean-Vincent Martini; Romaric Audigier; Angelique Loesch; Bertrand Luvison

Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection

Alexis Winter, Jean-Vincent Martini, Romaric Audigier, Angelique Loesch, Bertrand Luvison

TL;DR

This work tackles the fragmentation in adversarial robustness research for object detection by introducing a unified benchmark for digital, non-patch attacks. It defines AP_loc and CSR to separately quantify localization and classification effects and employs perceptual distances like LPIPS alongside $L_2$ to reflect attack cost more faithfully. The study reveals a cross-architecture robustness gap, with modern attacker methods transferring poorly to transformer-based detectors, and shows that adversarial training with a mix of high-perturbation attacks yields the strongest, broad-spectrum defense. Overall, the benchmark enables fair, reproducible comparisons and highlights practical implications for deploying robust detectors in real-world systems.

Abstract

Object detection models are critical components of automated systems, such as autonomous vehicles and perception-based robots, but their sensitivity to adversarial attacks poses a serious security risk. Progress in defending these models lags behind classification, hindered by a lack of standardized evaluation. It is nearly impossible to thoroughly compare attack or defense methods, as existing work uses different datasets, inconsistent efficiency metrics, and varied measures of perturbation cost. This paper addresses this gap by investigating three key questions: (1) How can we create a fair benchmark to impartially compare attacks? (2) How well do modern attacks transfer across different architectures, especially from Convolutional Neural Networks to Vision Transformers? (3) What is the most effective adversarial training strategy for robust defense? To answer these, we first propose a unified benchmark framework focused on digital, non-patch-based attacks. This framework introduces specific metrics to disentangle localization and classification errors and evaluates attack cost using multiple perceptual metrics. Using this benchmark, we conduct extensive experiments on state-of-the-art attacks and a wide range of detectors. Our findings reveal two major conclusions: first, modern adversarial attacks against object detection models show a significant lack of transferability to transformer-based architectures. Second, we demonstrate that the most robust adversarial training strategy leverages a dataset composed of a mix of high-perturbation attacks with different objectives (e.g., spatial and semantic), which outperforms training on any single attack.

Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection

TL;DR

to reflect attack cost more faithfully. The study reveals a cross-architecture robustness gap, with modern attacker methods transferring poorly to transformer-based detectors, and shows that adversarial training with a mix of high-perturbation attacks yields the strongest, broad-spectrum defense. Overall, the benchmark enables fair, reproducible comparisons and highlights practical implications for deploying robust detectors in real-world systems.

Abstract

Paper Structure (51 sections, 8 equations, 6 figures, 3 tables)

This paper contains 51 sections, 8 equations, 6 figures, 3 tables.

Acknowledgments
Introduction
Landscape Analysis: Taxonomy, Method Review and Evaluation Gaps
Preliminaries and terminology
Formal definition of adversarial examples
A taxonomy of attacks and models for object detection
Other definitions
A review of adversarial attacks
Object mislabeling
Object vanishing
Object fabrication
Random output
Other outputs
A review of adversarial defense methods
Adversarial training and training modification
...and 36 more sections

Figures (6)

Figure 1: A taxonomy of adversarial attacks in object detection.
Figure 2: Inference results for different attack outcomes. From left to right: clean image, object mislabeling attack (EBAD EBAD), random output attack (OSFD OSFD), object vanishing attack, and object fabrication attack (PhantomSponges PhantomSponges).
Figure 3: Threat models for the selected attacks. This diagram illustrates the source models and knowledge access required for generating each attack. $D_s$ denotes the surrogate models used for attack generation, while $D_v$ represents the victim model queried during the optimization of grey-box attacks (e.g., EBAD). $D_t$ refers to the target model used for final evaluation.
Figure 4: Quantitative analysis of attack imperceptibility on the VOC2007 test set. The plots illustrate the mean and standard deviation of the perturbations across various distance ($L_2$, $L_\infty$) and perceptual (SSIM, LPIPS) metrics.
Figure 5: Visual samples from VOC2007 test set and their perturbed versions. Top row (left to right): benign image, CAA ($\epsilon=10, 30$), and EBAD on YOLOv3 ($\epsilon=10, 30$). Bottom row (left to right): EBAD on YOLOv3 ($\epsilon=50$), EBAD on Faster R-CNN ($\epsilon=10$), OSFD on YOLOv3 and Faster R-CNN ($\epsilon=5$), and PhantomSponges ($\epsilon=70$). Perturbations best viewed with zoom at 300%.
...and 1 more figures

Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection

TL;DR

Abstract

Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)