Table of Contents
Fetching ...

Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors

Raz Lapid, Eylon Mizrahi, Moshe Sipper

TL;DR

This work tackles realistic adversarial threats to object detectors under limited attacker access by introducing BBNP, a black-box, gradient-free method that generates universal, naturalistic patches via a pretrained GAN's latent space. It optimizes a total loss $L_{total} = L_{det} + \lambda_{tv} L_{tv} + \lambda_{cls} L_{cls}$ using an Evolution Strategy to adjust a latent vector in a compact $d=120$-dimensional space, producing patches applied to $25\%$ of each bounding box. Evaluated on INRIA with five detectors, BBNP outperforms all black-box baselines and even rivals some white-box attacks, while patches score higher on human naturalness assessments and show strong transferability across models. The results highlight significant real-world risks to object-detection systems and motivate ongoing development of robust defenses and ethical guidelines for deployment. Key contributions include a model-agnostic, patch-based black-box attack that operates in GAN latent space, a compact and efficient optimization framework, comprehensive transferability analysis, and subjective assessments of patch realism, all demonstrated across multiple detectors and scenarios.

Abstract

Adversarial attacks on deep learning models have received increased attention in recent years. Work in this area has mostly focused on gradient-based techniques, so-called 'white-box' attacks, where the attacker has access to the targeted model's internal parameters; such an assumption is usually untenable in the real world. Additionally, some attacks use the entire pixel space to fool a given model, which is neither practical nor physical. To accommodate these problems we propose the BBNP algorithm (Black-Box Naturalistic Patch): a direct, black-box, naturalistic, gradient-free method that uses the learned image manifold of a pretrained, generative adversarial network (GAN) to generate naturalistic adversarial patches for object detectors. This method performs model-agnostic black-box naturalistic attacks on object detection models by relying solely on the outputs of the model. Comparing our approach against five models, five black-box and two white-box attacks, we show that our proposed method achieves state-of-the-art results, outperforming all other tested black-box approaches.

Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors

TL;DR

This work tackles realistic adversarial threats to object detectors under limited attacker access by introducing BBNP, a black-box, gradient-free method that generates universal, naturalistic patches via a pretrained GAN's latent space. It optimizes a total loss using an Evolution Strategy to adjust a latent vector in a compact -dimensional space, producing patches applied to of each bounding box. Evaluated on INRIA with five detectors, BBNP outperforms all black-box baselines and even rivals some white-box attacks, while patches score higher on human naturalness assessments and show strong transferability across models. The results highlight significant real-world risks to object-detection systems and motivate ongoing development of robust defenses and ethical guidelines for deployment. Key contributions include a model-agnostic, patch-based black-box attack that operates in GAN latent space, a compact and efficient optimization framework, comprehensive transferability analysis, and subjective assessments of patch realism, all demonstrated across multiple detectors and scenarios.

Abstract

Adversarial attacks on deep learning models have received increased attention in recent years. Work in this area has mostly focused on gradient-based techniques, so-called 'white-box' attacks, where the attacker has access to the targeted model's internal parameters; such an assumption is usually untenable in the real world. Additionally, some attacks use the entire pixel space to fool a given model, which is neither practical nor physical. To accommodate these problems we propose the BBNP algorithm (Black-Box Naturalistic Patch): a direct, black-box, naturalistic, gradient-free method that uses the learned image manifold of a pretrained, generative adversarial network (GAN) to generate naturalistic adversarial patches for object detectors. This method performs model-agnostic black-box naturalistic attacks on object detection models by relying solely on the outputs of the model. Comparing our approach against five models, five black-box and two white-box attacks, we show that our proposed method achieves state-of-the-art results, outperforming all other tested black-box approaches.
Paper Structure (16 sections, 8 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 8 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: Naturalistic Black-Box Adversarial Attack: Overview of framework. The system creates patches for object detectors by using the learned image manifold of a pretrained GAN generator ($G$) on real-world images. We use a pretrained classifier ($C$) to force the optimizer to find a patch that resembles a specific class, the $TV$ component, which refers to total variation, aims to ensure the images are rendered with maximal smoothness, and the detector ($D$) for the actual detection loss. Efficient sampling of the GAN images via an iterative evolution strategy ultimately generates the final patch. The snowflake icon means frozen model weights.
  • Figure 2: Images generated utilizing BigGAN, without applying any projection onto the norm of the latent vector.
  • Figure 3: Digital examples of all tested black-box algorithms, on Tiny-YOLOv3 (top), Tiny-YOLOv4 (middle), and YOLOv5s (bottom).
  • Figure 4: Naturalness subjective assessments of our adversarial patches in comparison to other baseline methodologies. The naturalness scores represent the participant votes for each test image relative to the entire cohort. As evidenced by the results, our patches garnered a higher mean score than their counterparts.
  • Figure 5: Patches evolved by our algorithm, on Tiny-YOLOv3, Tiny-YOLOv4, YOLOv5s, SSD-L and L-DETR with $\lambda_{cls}=0.2$. Left to right: population sizes of 50, 70, 90, and 110, respectively.
  • ...and 3 more figures