Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors
Raz Lapid, Eylon Mizrahi, Moshe Sipper
TL;DR
This work tackles realistic adversarial threats to object detectors under limited attacker access by introducing BBNP, a black-box, gradient-free method that generates universal, naturalistic patches via a pretrained GAN's latent space. It optimizes a total loss $L_{total} = L_{det} + \lambda_{tv} L_{tv} + \lambda_{cls} L_{cls}$ using an Evolution Strategy to adjust a latent vector in a compact $d=120$-dimensional space, producing patches applied to $25\%$ of each bounding box. Evaluated on INRIA with five detectors, BBNP outperforms all black-box baselines and even rivals some white-box attacks, while patches score higher on human naturalness assessments and show strong transferability across models. The results highlight significant real-world risks to object-detection systems and motivate ongoing development of robust defenses and ethical guidelines for deployment. Key contributions include a model-agnostic, patch-based black-box attack that operates in GAN latent space, a compact and efficient optimization framework, comprehensive transferability analysis, and subjective assessments of patch realism, all demonstrated across multiple detectors and scenarios.
Abstract
Adversarial attacks on deep learning models have received increased attention in recent years. Work in this area has mostly focused on gradient-based techniques, so-called 'white-box' attacks, where the attacker has access to the targeted model's internal parameters; such an assumption is usually untenable in the real world. Additionally, some attacks use the entire pixel space to fool a given model, which is neither practical nor physical. To accommodate these problems we propose the BBNP algorithm (Black-Box Naturalistic Patch): a direct, black-box, naturalistic, gradient-free method that uses the learned image manifold of a pretrained, generative adversarial network (GAN) to generate naturalistic adversarial patches for object detectors. This method performs model-agnostic black-box naturalistic attacks on object detection models by relying solely on the outputs of the model. Comparing our approach against five models, five black-box and two white-box attacks, we show that our proposed method achieves state-of-the-art results, outperforming all other tested black-box approaches.
