Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors

Raz Lapid; Eylon Mizrahi; Moshe Sipper

Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors

Raz Lapid, Eylon Mizrahi, Moshe Sipper

TL;DR

This work tackles realistic adversarial threats to object detectors under limited attacker access by introducing BBNP, a black-box, gradient-free method that generates universal, naturalistic patches via a pretrained GAN's latent space. It optimizes a total loss $L_{total} = L_{det} + \lambda_{tv} L_{tv} + \lambda_{cls} L_{cls}$ using an Evolution Strategy to adjust a latent vector in a compact $d=120$-dimensional space, producing patches applied to $25\%$ of each bounding box. Evaluated on INRIA with five detectors, BBNP outperforms all black-box baselines and even rivals some white-box attacks, while patches score higher on human naturalness assessments and show strong transferability across models. The results highlight significant real-world risks to object-detection systems and motivate ongoing development of robust defenses and ethical guidelines for deployment. Key contributions include a model-agnostic, patch-based black-box attack that operates in GAN latent space, a compact and efficient optimization framework, comprehensive transferability analysis, and subjective assessments of patch realism, all demonstrated across multiple detectors and scenarios.

Abstract

Adversarial attacks on deep learning models have received increased attention in recent years. Work in this area has mostly focused on gradient-based techniques, so-called 'white-box' attacks, where the attacker has access to the targeted model's internal parameters; such an assumption is usually untenable in the real world. Additionally, some attacks use the entire pixel space to fool a given model, which is neither practical nor physical. To accommodate these problems we propose the BBNP algorithm (Black-Box Naturalistic Patch): a direct, black-box, naturalistic, gradient-free method that uses the learned image manifold of a pretrained, generative adversarial network (GAN) to generate naturalistic adversarial patches for object detectors. This method performs model-agnostic black-box naturalistic attacks on object detection models by relying solely on the outputs of the model. Comparing our approach against five models, five black-box and two white-box attacks, we show that our proposed method achieves state-of-the-art results, outperforming all other tested black-box approaches.

Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors

TL;DR

using an Evolution Strategy to adjust a latent vector in a compact

-dimensional space, producing patches applied to

of each bounding box. Evaluated on INRIA with five detectors, BBNP outperforms all black-box baselines and even rivals some white-box attacks, while patches score higher on human naturalness assessments and show strong transferability across models. The results highlight significant real-world risks to object-detection systems and motivate ongoing development of robust defenses and ethical guidelines for deployment. Key contributions include a model-agnostic, patch-based black-box attack that operates in GAN latent space, a compact and efficient optimization framework, comprehensive transferability analysis, and subjective assessments of patch realism, all demonstrated across multiple detectors and scenarios.

Abstract

Paper Structure (16 sections, 8 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 8 equations, 8 figures, 1 table, 1 algorithm.

Introduction
Previous work
Method
Generating adversarial patches
Adversarial gradient estimation
Experiments and Results
Experimental setup
Results Analysis
Transferability Analysis
Subjective Assessment of the Naturalness of Various Adversarial Patches
Ablation Studies
Discussion
Ethical Considerations and Societal Impact
Naturalistic Setting
Limitations
...and 1 more sections

Figures (8)

Figure 1: Naturalistic Black-Box Adversarial Attack: Overview of framework. The system creates patches for object detectors by using the learned image manifold of a pretrained GAN generator ($G$) on real-world images. We use a pretrained classifier ($C$) to force the optimizer to find a patch that resembles a specific class, the $TV$ component, which refers to total variation, aims to ensure the images are rendered with maximal smoothness, and the detector ($D$) for the actual detection loss. Efficient sampling of the GAN images via an iterative evolution strategy ultimately generates the final patch. The snowflake icon means frozen model weights.
Figure 2: Images generated utilizing BigGAN, without applying any projection onto the norm of the latent vector.
Figure 3: Digital examples of all tested black-box algorithms, on Tiny-YOLOv3 (top), Tiny-YOLOv4 (middle), and YOLOv5s (bottom).
Figure 4: Naturalness subjective assessments of our adversarial patches in comparison to other baseline methodologies. The naturalness scores represent the participant votes for each test image relative to the entire cohort. As evidenced by the results, our patches garnered a higher mean score than their counterparts.
Figure 5: Patches evolved by our algorithm, on Tiny-YOLOv3, Tiny-YOLOv4, YOLOv5s, SSD-L and L-DETR with $\lambda_{cls}=0.2$. Left to right: population sizes of 50, 70, 90, and 110, respectively.
...and 3 more figures

Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors

TL;DR

Abstract

Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors

Authors

TL;DR

Abstract

Table of Contents

Figures (8)