Simple Black-Box Adversarial Perturbations for Deep Networks
Nina Narodytska, Shiva Prasad Kasiviswanathan
TL;DR
This paper shows that deep CNNs, including modern architectures, are vulnerable to adversarial perturbations even under strict black-box conditions. It introduces two gradient-free attack strategies: a first-pass single-pixel perturbation revealing widespread presence of critical pixels, and a scalable greedy local-search (LocSearchAdv) that perturbs small pixel neighborhoods to induce $k$-misclassification with minimal perturbation. Across datasets from MNIST to ImageNet, LocSearchAdv outperforms the fast FGSM in success rate while perturbing far fewer pixels, and it remains effective under larger $k$ and when facing weaker adversaries. The work also links perturbed pixels to saliency regions without computing gradients, highlighting practical implications for robustness testing and defense design. Overall, it provides a practical, scalable benchmark for evaluating and stress-testing CNN robustness in realistic black-box scenarios.
Abstract
Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to carefully crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expected system behavior leading to undesired consequences and could pose a security risk when these systems are deployed in the real world. In this work, we focus on deep convolutional neural networks and demonstrate that adversaries can easily craft adversarial examples even without any internal knowledge of the target network. Our attacks treat the network as an oracle (black-box) and only assume that the output of the network can be observed on the probed inputs. Our first attack is based on a simple idea of adding perturbation to a randomly selected single pixel or a small set of them. We then improve the effectiveness of this attack by carefully constructing a small set of pixels to perturb by using the idea of greedy local-search. Our proposed attacks also naturally extend to a stronger notion of misclassification. Our extensive experimental results illustrate that even these elementary attacks can reveal a deep neural network's vulnerabilities. The simplicity and effectiveness of our proposed schemes mean that they could serve as a litmus test for designing robust networks.
