Table of Contents
Fetching ...

A Generative Approach to Surrogate-based Black-box Attacks

Raha Moraffah, Huan Liu

TL;DR

This work tackles black-box adversarial attacks by replacing discriminative surrogates with a GAN-based generative surrogate that learns the distribution of samples lying on or near the target model's decision boundaries. The two-stage GSBA framework first trains the generator to produce realistic, boundary-proximate, and class-controlled samples, supplemented by knowledge distillation from the unknown target. It then exploits this distribution by selecting the closest generated example to the original input, achieving adversarial misclassification in a single step with strong attack success rates and high query efficiency. Empirical results on CIFAR-10/100 demonstrate substantial improvements over state-of-the-art surrogate-based attacks, along with thorough ablations and budget analyses confirming the method's practicality and robustness.

Abstract

Surrogate-based black-box attacks have exposed the heightened vulnerability of DNNs. These attacks are designed to craft adversarial examples for any samples with black-box target feedback for only a given set of samples. State-of-the-art surrogate-based attacks involve training a discriminative surrogate that mimics the target's outputs. The goal is to learn the decision boundaries of the target. The surrogate is then attacked by white-box attacks to craft adversarial examples similar to the original samples but belong to other classes. With limited samples, the discriminative surrogate fails to accurately learn the target's decision boundaries, and these surrogate-based attacks suffer from low success rates. Different from the discriminative approach, we propose a generative surrogate that learns the distribution of samples residing on or close to the target's decision boundaries. The distribution learned by the generative surrogate can be used to craft adversarial examples that have imperceptible differences from the original samples but belong to other classes. The proposed generative approach results in attacks with remarkably high attack success rates on various targets and datasets.

A Generative Approach to Surrogate-based Black-box Attacks

TL;DR

This work tackles black-box adversarial attacks by replacing discriminative surrogates with a GAN-based generative surrogate that learns the distribution of samples lying on or near the target model's decision boundaries. The two-stage GSBA framework first trains the generator to produce realistic, boundary-proximate, and class-controlled samples, supplemented by knowledge distillation from the unknown target. It then exploits this distribution by selecting the closest generated example to the original input, achieving adversarial misclassification in a single step with strong attack success rates and high query efficiency. Empirical results on CIFAR-10/100 demonstrate substantial improvements over state-of-the-art surrogate-based attacks, along with thorough ablations and budget analyses confirming the method's practicality and robustness.

Abstract

Surrogate-based black-box attacks have exposed the heightened vulnerability of DNNs. These attacks are designed to craft adversarial examples for any samples with black-box target feedback for only a given set of samples. State-of-the-art surrogate-based attacks involve training a discriminative surrogate that mimics the target's outputs. The goal is to learn the decision boundaries of the target. The surrogate is then attacked by white-box attacks to craft adversarial examples similar to the original samples but belong to other classes. With limited samples, the discriminative surrogate fails to accurately learn the target's decision boundaries, and these surrogate-based attacks suffer from low success rates. Different from the discriminative approach, we propose a generative surrogate that learns the distribution of samples residing on or close to the target's decision boundaries. The distribution learned by the generative surrogate can be used to craft adversarial examples that have imperceptible differences from the original samples but belong to other classes. The proposed generative approach results in attacks with remarkably high attack success rates on various targets and datasets.
Paper Structure (14 sections, 9 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 14 sections, 9 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of the attack on discriminative vs generative surrogates. Figure (a) shows the target decision boundary for two classes. Figure (b) illustrates the discriminative surrogate that aims to learn the target decision boundary with limited samples. Figure (c) demonstrates a generative surrogate that learns the distribution of samples residing on/close to the target decision boundary. The adversarial example (in $\square$) for the original sample (in $\circ$) cannot be identified using the discriminative surrogate, due to its wrong decision boundary, whereas in the generative surrogate, the adversarial example is directly sampled from the distributions.
  • Figure 2: An overview of the GSBA framework. (1) S1: Generative Surrogate Training. This step aims to train a surrogate that learns the distribution of samples with three properties: (i) realness ($\mathcal{L}_{\operatorname{G}}$, $\mathcal{L}^{adv}_{\operatorname{G}}$, and $\mathcal{L}^{y}_{\operatorname{cnt}}$);(ii) high inter-class similarity ($\mathcal{L}^{y}_{\operatorname{sim}}$); and (iii) high intra-class diversity ($\mathcal{L}^{y}_{\operatorname{sim}}$). Once trained, the surrogate will be directly leveraged to generate adversarial examples.; (2) S2: Attack Strategy. The original benign sample is shown with $\blacksquare$. Under the untargeted setting, first the most similar samples to the original sample generated by the surrogate for any other class except the original adversarial examples are identified (denoted by $x^*_i$). The sample with minimum distance from the original sample is then selected as the adversarial example ($x^*_{3}$). Under the targeted setting, the most similar sample from the target class (shown in hatch) is selected as the adversarial example ($x^*_{0}$).
  • Figure 3: Attack success rates (ASR) with different query budgets