A Generative Approach to Surrogate-based Black-box Attacks
Raha Moraffah, Huan Liu
TL;DR
This work tackles black-box adversarial attacks by replacing discriminative surrogates with a GAN-based generative surrogate that learns the distribution of samples lying on or near the target model's decision boundaries. The two-stage GSBA framework first trains the generator to produce realistic, boundary-proximate, and class-controlled samples, supplemented by knowledge distillation from the unknown target. It then exploits this distribution by selecting the closest generated example to the original input, achieving adversarial misclassification in a single step with strong attack success rates and high query efficiency. Empirical results on CIFAR-10/100 demonstrate substantial improvements over state-of-the-art surrogate-based attacks, along with thorough ablations and budget analyses confirming the method's practicality and robustness.
Abstract
Surrogate-based black-box attacks have exposed the heightened vulnerability of DNNs. These attacks are designed to craft adversarial examples for any samples with black-box target feedback for only a given set of samples. State-of-the-art surrogate-based attacks involve training a discriminative surrogate that mimics the target's outputs. The goal is to learn the decision boundaries of the target. The surrogate is then attacked by white-box attacks to craft adversarial examples similar to the original samples but belong to other classes. With limited samples, the discriminative surrogate fails to accurately learn the target's decision boundaries, and these surrogate-based attacks suffer from low success rates. Different from the discriminative approach, we propose a generative surrogate that learns the distribution of samples residing on or close to the target's decision boundaries. The distribution learned by the generative surrogate can be used to craft adversarial examples that have imperceptible differences from the original samples but belong to other classes. The proposed generative approach results in attacks with remarkably high attack success rates on various targets and datasets.
