Table of Contents
Fetching ...

Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors

Pham Phuc, Son Vuong, Khang Nguyen, Tuan Dang

TL;DR

The paper tackles the vulnerability of modern object detectors to adversarial perturbations by introducing a distortion-aware, iterative gradient-based attack that selectively perturbs bounding-box regions via object masks and detector loss. By enforcing a controllable distortion metric and targeting per-box confidences, the method achieves high attack success across diverse architectures (YOLOv8, Faster R-CNN, RetinaNet, Swin Transformer) on COCO 2017 and VOC 2012, with white-box and black-box success rates reaching up to 100% and 98%, respectively. The work demonstrates strong cross-model and cross-domain transferability, supported by Grad-CAM analyses that reveal shifted attention and reduced confidence in attacked regions. The findings underscore practical risks to real-world detection systems and contribute a robust framework for evaluating and improving detector robustness, with code released to foster defense research.

Abstract

Deep learning-based object detection has become ubiquitous in the last decade due to its high accuracy in many real-world applications. With this growing trend, these models are interested in being attacked by adversaries, with most of the results being on classifiers, which do not match the context of practical object detection. In this work, we propose a novel method to fool object detectors, expose the vulnerability of state-of-the-art detectors, and promote later works to build more robust detectors to adversarial examples. Our method aims to generate adversarial images by perturbing object confidence scores during training, which is crucial in predicting confidence for each class in the testing phase. Herein, we provide a more intuitive technique to embed additive noises based on detected objects' masks and the training loss with distortion control over the original image by leveraging the gradient of iterative images. To verify the proposed method, we perform adversarial attacks against different object detectors, including the most recent state-of-the-art models like YOLOv8, Faster R-CNN, RetinaNet, and Swin Transformer. We also evaluate our technique on MS COCO 2017 and PASCAL VOC 2012 datasets and analyze the trade-off between success attack rate and image distortion. Our experiments show that the achievable success attack rate is up to $100$\% and up to $98$\% when performing white-box and black-box attacks, respectively. The source code and relevant documentation for this work are available at the following link: https://github.com/anonymous20210106/attack_detector

Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors

TL;DR

The paper tackles the vulnerability of modern object detectors to adversarial perturbations by introducing a distortion-aware, iterative gradient-based attack that selectively perturbs bounding-box regions via object masks and detector loss. By enforcing a controllable distortion metric and targeting per-box confidences, the method achieves high attack success across diverse architectures (YOLOv8, Faster R-CNN, RetinaNet, Swin Transformer) on COCO 2017 and VOC 2012, with white-box and black-box success rates reaching up to 100% and 98%, respectively. The work demonstrates strong cross-model and cross-domain transferability, supported by Grad-CAM analyses that reveal shifted attention and reduced confidence in attacked regions. The findings underscore practical risks to real-world detection systems and contribute a robust framework for evaluating and improving detector robustness, with code released to foster defense research.

Abstract

Deep learning-based object detection has become ubiquitous in the last decade due to its high accuracy in many real-world applications. With this growing trend, these models are interested in being attacked by adversaries, with most of the results being on classifiers, which do not match the context of practical object detection. In this work, we propose a novel method to fool object detectors, expose the vulnerability of state-of-the-art detectors, and promote later works to build more robust detectors to adversarial examples. Our method aims to generate adversarial images by perturbing object confidence scores during training, which is crucial in predicting confidence for each class in the testing phase. Herein, we provide a more intuitive technique to embed additive noises based on detected objects' masks and the training loss with distortion control over the original image by leveraging the gradient of iterative images. To verify the proposed method, we perform adversarial attacks against different object detectors, including the most recent state-of-the-art models like YOLOv8, Faster R-CNN, RetinaNet, and Swin Transformer. We also evaluate our technique on MS COCO 2017 and PASCAL VOC 2012 datasets and analyze the trade-off between success attack rate and image distortion. Our experiments show that the achievable success attack rate is up to \% and up to \% when performing white-box and black-box attacks, respectively. The source code and relevant documentation for this work are available at the following link: https://github.com/anonymous20210106/attack_detector

Paper Structure

This paper contains 24 sections, 14 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Adversarial attacks on bounding boxes of an object detector with distortion awareness can perturb a sequence of images taken from a surveillance camera with a controllable added amount of distortion to obtain a certain success attack rate, making the object detector disabled. The demonstration video of the illustrated sequences is available at https://youtu.be/y_sQqECMJIk.
  • Figure 2: Illustration of adversarial attack with decision boundaries formed by k discriminant functions: attackers are looking for alternative $x$ that is similar to $x_0$ such that $g_i(x) < g_t(x_0)$ for $i=1,2,..,k$ and $t \neq i$ so that the model $f$ classify $x$ as $t$. An untargeted attack seeks $x$ such that the model, $f$, classifies $x$ as all $C_j$ where $i \neq j$. In this example, we choose $t=5$ and $i=3$.
  • Figure 3: The convergence of loss over 120 iterations on a subset of images from the MS COCO 2017 dataset.
  • Figure 4: Relationship between attacking rate and target distortion on detection models set with confidence thresholds of $0.75$.
  • Figure 5: Relationship between confidence score and distortion at a success attack rate of $97\%$ on various-sized YOLOv8 models.
  • ...and 5 more figures