Table of Contents
Fetching ...

Network transferability of adversarial patches in real-time object detection

Jens Bayer, Stefan Becker, David Münch, Michael Arens

TL;DR

This work addresses the problem of how adversarial patches designed for real-time object detectors transfer across architectures and datasets. It conducts an extensive empirical study by training 280 patches on INRIA and evaluating them on 28 detectors pretrained on COCO, using a patch-optimization procedure that follows prior methods while tailoring loss components to different architectures. The key findings show that patches optimized with larger models tend to transfer more effectively across networks, with YOLOv9 and YOLOv10 delivering the strongest cross-model impact, while YOLO-NAS and RT-DETR exhibit greater robustness. The results highlight cross-architecture vulnerabilities in modern real-time detectors and suggest directions for defense, including further investigation into grayscale patch effects tied to training-time padding practices.

Abstract

Adversarial patches in computer vision can be used, to fool deep neural networks and manipulate their decision-making process. One of the most prominent examples of adversarial patches are evasion attacks for object detectors. By covering parts of objects of interest, these patches suppress the detections and thus make the target object 'invisible' to the object detector. Since these patches are usually optimized on a specific network with a specific train dataset, the transferability across multiple networks and datasets is not given. This paper addresses these issues and investigates the transferability across numerous object detector architectures. Our extensive evaluation across various models on two distinct datasets indicates that patches optimized with larger models provide better network transferability than patches that are optimized with smaller models.

Network transferability of adversarial patches in real-time object detection

TL;DR

This work addresses the problem of how adversarial patches designed for real-time object detectors transfer across architectures and datasets. It conducts an extensive empirical study by training 280 patches on INRIA and evaluating them on 28 detectors pretrained on COCO, using a patch-optimization procedure that follows prior methods while tailoring loss components to different architectures. The key findings show that patches optimized with larger models tend to transfer more effectively across networks, with YOLOv9 and YOLOv10 delivering the strongest cross-model impact, while YOLO-NAS and RT-DETR exhibit greater robustness. The results highlight cross-architecture vulnerabilities in modern real-time detectors and suggest directions for defense, including further investigation into grayscale patch effects tied to training-time padding practices.

Abstract

Adversarial patches in computer vision can be used, to fool deep neural networks and manipulate their decision-making process. One of the most prominent examples of adversarial patches are evasion attacks for object detectors. By covering parts of objects of interest, these patches suppress the detections and thus make the target object 'invisible' to the object detector. Since these patches are usually optimized on a specific network with a specific train dataset, the transferability across multiple networks and datasets is not given. This paper addresses these issues and investigates the transferability across numerous object detector architectures. Our extensive evaluation across various models on two distinct datasets indicates that patches optimized with larger models provide better network transferability than patches that are optimized with smaller models.
Paper Structure (12 sections, 4 equations, 6 figures, 1 table)

This paper contains 12 sections, 4 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Compatibility matrix of the networks evaluated on the INRIA Person test set. Each cell represents the mean average precision (mAP) drop of a set of patches on the test set for a specific network. The brighter the color, the higher the relative mAP drop of the patch.
  • Figure 2: Embedded Inceptionv3 features of the patches via t-SNE. The color of each data point corresponds to the origin, the symbol to the general architecture group, and the size of the symbol to the relative mAP drop. The mAP drop refers to the INRIA Person test set.
  • Figure 3: Patches that have been optimized with the YOLOv8-s network. Certain similarities in terms of color and shape are clearly visible.
  • Figure 4: Histograms of the RGB channels. The first column shows the histograms of a single patch. The second column, the histograms of all patches optimized with the YOLOv8-s network. The third, the histograms of all optimized patches, regardless of the network used to optimize a patch.
  • Figure 5: Histograms of the HSV-channels. As in \ref{['fig:rgb_analysis']}, the first column shows the histograms of a single patch, the second the histograms of all patches optimized with the YOLOv8-s network and the third the histograms of all optimized patches.
  • ...and 1 more figures