Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors

Pham Phuc; Son Vuong; Khang Nguyen; Tuan Dang

Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors

Pham Phuc, Son Vuong, Khang Nguyen, Tuan Dang

TL;DR

The paper tackles the vulnerability of modern object detectors to adversarial perturbations by introducing a distortion-aware, iterative gradient-based attack that selectively perturbs bounding-box regions via object masks and detector loss. By enforcing a controllable distortion metric and targeting per-box confidences, the method achieves high attack success across diverse architectures (YOLOv8, Faster R-CNN, RetinaNet, Swin Transformer) on COCO 2017 and VOC 2012, with white-box and black-box success rates reaching up to 100% and 98%, respectively. The work demonstrates strong cross-model and cross-domain transferability, supported by Grad-CAM analyses that reveal shifted attention and reduced confidence in attacked regions. The findings underscore practical risks to real-world detection systems and contribute a robust framework for evaluating and improving detector robustness, with code released to foster defense research.

Abstract

Deep learning-based object detection has become ubiquitous in the last decade due to its high accuracy in many real-world applications. With this growing trend, these models are interested in being attacked by adversaries, with most of the results being on classifiers, which do not match the context of practical object detection. In this work, we propose a novel method to fool object detectors, expose the vulnerability of state-of-the-art detectors, and promote later works to build more robust detectors to adversarial examples. Our method aims to generate adversarial images by perturbing object confidence scores during training, which is crucial in predicting confidence for each class in the testing phase. Herein, we provide a more intuitive technique to embed additive noises based on detected objects' masks and the training loss with distortion control over the original image by leveraging the gradient of iterative images. To verify the proposed method, we perform adversarial attacks against different object detectors, including the most recent state-of-the-art models like YOLOv8, Faster R-CNN, RetinaNet, and Swin Transformer. We also evaluate our technique on MS COCO 2017 and PASCAL VOC 2012 datasets and analyze the trade-off between success attack rate and image distortion. Our experiments show that the achievable success attack rate is up to $100$\% and up to $98$\% when performing white-box and black-box attacks, respectively. The source code and relevant documentation for this work are available at the following link: https://github.com/anonymous20210106/attack_detector

Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors

TL;DR

Abstract

Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)