Table of Contents
Fetching ...

STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario

Renyang Liu, Kwok-Yan Lam, Wei Zhou, Sixing Wu, Jun Zhao, Dongting Hu, Mingming Gong

TL;DR

The Spatial Transform Black-box Attack (STBA) is proposed, a novel framework to craft formidable adversarial examples in the query-limited scenario that could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.

Abstract

Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built in a noise-adding manner are abnormal and struggle to successfully attack robust models, whose robustness is enhanced by adversarial training against small perturbations. There is no doubt that these two issues mentioned above will significantly increase the risk of exposure and result in a failure to dig deeply into the vulnerability of DNNs. Hence, it is necessary to evaluate DNNs' fragility sufficiently under query-limited settings in a non-additional way. In this paper, we propose the Spatial Transform Black-box Attack (STBA), a novel framework to craft formidable adversarial examples in the query-limited scenario. Specifically, STBA introduces a flow field to the high-frequency part of clean images to generate adversarial examples and adopts the following two processes to enhance their naturalness and significantly improve the query efficiency: a) we apply an estimated flow field to the high-frequency part of clean images to generate adversarial examples instead of introducing external noise to the benign image, and b) we leverage an efficient gradient estimation method based on a batch of samples to optimize such an ideal flow field under query-limited settings. Compared to existing score-based black-box baselines, extensive experiments indicated that STBA could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.

STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario

TL;DR

The Spatial Transform Black-box Attack (STBA) is proposed, a novel framework to craft formidable adversarial examples in the query-limited scenario that could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.

Abstract

Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built in a noise-adding manner are abnormal and struggle to successfully attack robust models, whose robustness is enhanced by adversarial training against small perturbations. There is no doubt that these two issues mentioned above will significantly increase the risk of exposure and result in a failure to dig deeply into the vulnerability of DNNs. Hence, it is necessary to evaluate DNNs' fragility sufficiently under query-limited settings in a non-additional way. In this paper, we propose the Spatial Transform Black-box Attack (STBA), a novel framework to craft formidable adversarial examples in the query-limited scenario. Specifically, STBA introduces a flow field to the high-frequency part of clean images to generate adversarial examples and adopts the following two processes to enhance their naturalness and significantly improve the query efficiency: a) we apply an estimated flow field to the high-frequency part of clean images to generate adversarial examples instead of introducing external noise to the benign image, and b) we leverage an efficient gradient estimation method based on a batch of samples to optimize such an ideal flow field under query-limited settings. Compared to existing score-based black-box baselines, extensive experiments indicated that STBA could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.
Paper Structure (17 sections, 14 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 14 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: The clean example and their corresponding adversarial example generated by baselines and the proposed STBA. The first one is the clean images, and the flowings are the adversarial example generated by Square Attack eccv/AndriushchenkoC20, AdvFlow nips/DolatabadiEL20, RS_Attack aaai/CroceASF022, ASH mir/LiH00S22, NPAttack pr/BaiWZJX23, and DifAttack aaai/00710ZT24, respectively.
  • Figure 2: Overview of Spatial Black-box Transform Attack (STBA). The Black-box Model is the victim model. The $\bm{x}$ and $\bm{x}^{adv}$ are the benign image and the corresponding adversarial counterpart, respectively. The ${\bm{x}_{high}}$ and $\bm{x}_{low}$ represent the high-frequency part and the low-frequency part of the benign image. The $\otimes$ represents the spatial transformation operation, and $\oplus$ means element addition. The $\bm{f}_i, i\in \{1,2,...,n\}$ are the mini-batch candidate flow field sampled from an isometric normal distribution $\mathcal{N}$, and the $\bm{f}^{adv}$ is the final optimized flow field that applies to the original clean image to formulate the final adversarial image $\bm{x}^{adv}$.
  • Figure 3: The attack success rate $vs.$ query numbers on clean models of the baselines and the proposed method, respectively, where the max query numbers are set to 1000. The bold red lines are the results of the proposed STBA.
  • Figure 4: The attack success rate $vs.$ query numbers on robust models of the baselines and the proposed method, respectively, where the max query numbers are set to 10000. The bold red lines are the results of the proposed STBA.
  • Figure 5: Confusion matrix of transferability for adversarial attacks generated by baselines and the proposed STBA on CIFAR-10, CIFAR-100, STL-10 and ImageNet, respectively. The row represents which model is targeted during the generation of adversarial examples, while the column represents which model is attacked by such synthesized examples.
  • ...and 4 more figures