Table of Contents
Fetching ...

Improving the Transferability of Adversarial Attacks by an Input Transpose

Qing Wan, Shilong Deng, Xun Wang

TL;DR

This work tackles the challenge of poor cross-model transferability of adversarial examples by proposing an input transpose method as a near-zero-cost technique. The core idea is to transpose input images, and additionally explore tiny $1^ o$ rotations, to boost the black-box transferability of both existing and ensemble-based attacks. Results show substantial gains on NIPS'17 and, to a lesser extent, CIFAR-10, with up to 803% improvements in single-model transferability and up to 347% in ensemble settings, while a $1^ o$ rotation yields notable gains on NIPS'17 due to induced low-level feature map fluctuations. The findings imply a dataset- and model-dependent optimal angle range (approximately $210^ o$-$240^ o$ degrees under unlimited queries) and highlight a low-cost avenue for refining transferable adversarial attacks, with practical implications for robustness evaluation and defense design.

Abstract

Deep neural networks (DNNs) are highly susceptible to adversarial examples--subtle perturbations applied to inputs that are often imperceptible to humans yet lead to incorrect model predictions. In black-box scenarios, however, existing adversarial examples exhibit limited transferability and struggle to effectively compromise multiple unseen DNN models. Previous strategies enhance the cross-model generalization of adversarial examples by introducing versatility into adversarial perturbations, thereby improving transferability. However, further refining perturbation versatility often demands intricate algorithm development and substantial computation consumption. In this work, we propose an input transpose method that requires almost no additional labor and computation costs but can significantly improve the transferability of existing adversarial strategies. Even without adding adversarial perturbations, our method demonstrates considerable effectiveness in cross-model attacks. Our exploration finds that on specific datasets, a mere $1^\circ$ left or right rotation might be sufficient for most adversarial examples to deceive unseen models. Our further analysis suggests that this transferability improvement triggered by rotating only $1^\circ$ may stem from visible pattern shifts in the DNN's low-level feature maps. Moreover, this transferability exhibits optimal angles that, when identified under unrestricted query conditions, could potentially yield even greater performance.

Improving the Transferability of Adversarial Attacks by an Input Transpose

TL;DR

This work tackles the challenge of poor cross-model transferability of adversarial examples by proposing an input transpose method as a near-zero-cost technique. The core idea is to transpose input images, and additionally explore tiny rotations, to boost the black-box transferability of both existing and ensemble-based attacks. Results show substantial gains on NIPS'17 and, to a lesser extent, CIFAR-10, with up to 803% improvements in single-model transferability and up to 347% in ensemble settings, while a rotation yields notable gains on NIPS'17 due to induced low-level feature map fluctuations. The findings imply a dataset- and model-dependent optimal angle range (approximately - degrees under unlimited queries) and highlight a low-cost avenue for refining transferable adversarial attacks, with practical implications for robustness evaluation and defense design.

Abstract

Deep neural networks (DNNs) are highly susceptible to adversarial examples--subtle perturbations applied to inputs that are often imperceptible to humans yet lead to incorrect model predictions. In black-box scenarios, however, existing adversarial examples exhibit limited transferability and struggle to effectively compromise multiple unseen DNN models. Previous strategies enhance the cross-model generalization of adversarial examples by introducing versatility into adversarial perturbations, thereby improving transferability. However, further refining perturbation versatility often demands intricate algorithm development and substantial computation consumption. In this work, we propose an input transpose method that requires almost no additional labor and computation costs but can significantly improve the transferability of existing adversarial strategies. Even without adding adversarial perturbations, our method demonstrates considerable effectiveness in cross-model attacks. Our exploration finds that on specific datasets, a mere left or right rotation might be sufficient for most adversarial examples to deceive unseen models. Our further analysis suggests that this transferability improvement triggered by rotating only may stem from visible pattern shifts in the DNN's low-level feature maps. Moreover, this transferability exhibits optimal angles that, when identified under unrestricted query conditions, could potentially yield even greater performance.

Paper Structure

This paper contains 22 sections, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Identify major feature fluctuations that potentially influence the $\text{Inc-v3}_{ens3}$'s decision under the single model setting. We use the IncRes-v2 as the white-box model and the DIM as the baseline attack. We illustrate five typical subplots: $(1)$ An AE that fails to deceive the $\text{Inc-v3}_{ens3}$. $(2)$ The $1^\circ$ left rotation of $(1)$ that successfully deceives the $\text{Inc-v3}_{ens3}$. $(3)$ Feature maps from $16$ selected channels in the Conv2d-1a layer of the $\text{Inc-v3}_{ens3}$ with $(1)$ as the input. $(4)$ Feature maps of the same $16$ channels with $(2)$ as the input. $(5)$ Absolute differences between $(3)$ and $(4)$, magnified for clarity. Note: Channels in $(3)$ and $(4)$ are selected based on the top 16 indices from $(5)$, ranked by descending mean absolute differences between feature maps with $(1)$ and $(2)$ as the input across all channels in the Conv2d-1a.
  • Figure 2: Identify the optimal rotation angle for maximizing transferability across six black-box models (NIPS'17). This figure is generated under the single-model setting with the Inc-v3 as the white-box model and DIM as the baseline attack. The x-axis represents the counter-clockwise rotation angle from $0^\circ$ to $360^\circ$, and the y-axis indicates the transferable attack success rates (%). The six curves depict the fluctuations in attack success rates for each black-box model as the rotation angle varies.
  • Figure 3: Identify the optimal rotation angle for maximizing transferability across three black-box models (CIFAR-10). The white-box model is the VGG-19BN (upper) or DenseNet-BC (lower), respectively. Please refer to other details in Figure \ref{['fig:optimal_angle:NIPS17']}.
  • Figure 4: Identify major feature fluctuations that potentially influence the $\text{Inc-v3}_{ens4}$'s decision under the single model setting. The white-box model and the baseline are consistent with those in Figure \ref{['fig:featureanalysis']}. We analyze the black-box model: $\text{Inc-v3}_{ens4}$. $(1)$ An AE that fails to deceive the $\text{Inc-v3}_{ens4}$. $(2)$ The $1^\circ$ left rotation of $(1)$ that successfully deceives the $\text{Inc-v3}_{ens4}$. $(3)$ Feature maps from $16$ selected channels in the Conv2d-1a layer in $\text{Inc-v3}_{ens4}$ with $(1)$ as the input. Similar as Figure \ref{['fig:featureanalysis']}, the $16$ channels are determined by $(5)$. $(4)$ Feature maps of the same $16$ channels with $(2)$ as the input; $(5)$ Absolute differences between $(3)$ and $(4)$ (Zoom in to see details).
  • Figure 5: Identify major feature fluctuations that potentially influence the decisions of Res-101 (upper row) and IncRes-v2$_{ens}$ (lower row) under the single model setting. The white-box model and the baseline are consistent with those in Figure \ref{['fig:featureanalysis']}. $(2)$ The $1^\circ$ right rotation of $(1)$ that successfully deceives the corresponding black-box model. $(3)$ The upper feature maps are selected from the Conv1 layer of the Block 1 in Res-101, while the lower feature maps are from the Conv2d-1a layer in IncRes-v2$_{ens}$. Additional details can be found in Figure \ref{['supp:fig:featureanalysis:dim:left_1:1']}.
  • ...and 6 more figures