BB-Patch: BlackBox Adversarial Patch-Attack using Zeroth-Order Optimization
Satyadwyoom Kumar, Saurabh Gupta, Arun Balaji Buduru
TL;DR
The paper tackles black-box adversarial patches that can be printed and placed anywhere on an image. It introduces BB-Patch, a patch crafted without access to model gradients by optimizing a patch parameter $z$ with a zeroth-order adaptive momentum method under an Expectation Over Transformations (EOT) objective. It shows that BB-Patch is scalable to MNIST, CIFAR-10, and ImageNet and transferable across architectures such as ResNet50, VGG16, and MobileNet, with patch-trained-on-one-model still reducing accuracy on others. A real-world demonstration on a distracted driving classifier shows the patch can shift predictions from “unsafe” to “safe” in practical settings. The findings imply that true black-box patches with printable constraints pose a realistic and significant threat to deployed vision systems.
Abstract
Deep Learning has become popular due to its vast applications in almost all domains. However, models trained using deep learning are prone to failure for adversarial samples and carry a considerable risk in sensitive applications. Most of these adversarial attack strategies assume that the adversary has access to the training data, the model parameters, and the input during deployment, hence, focus on perturbing the pixel level information present in the input image. Adversarial Patches were introduced to the community which helped in bringing out the vulnerability of deep learning models in a much more pragmatic manner but here the attacker has a white-box access to the model parameters. Recently, there has been an attempt to develop these adversarial attacks using black-box techniques. However, certain assumptions such as availability large training data is not valid for a real-life scenarios. In a real-life scenario, the attacker can only assume the type of model architecture used from a select list of state-of-the-art architectures while having access to only a subset of input dataset. Hence, we propose an black-box adversarial attack strategy that produces adversarial patches which can be applied anywhere in the input image to perform an adversarial attack.
