Improving the Transferability of Adversarial Attacks by an Input Transpose
Qing Wan, Shilong Deng, Xun Wang
TL;DR
This work tackles the challenge of poor cross-model transferability of adversarial examples by proposing an input transpose method as a near-zero-cost technique. The core idea is to transpose input images, and additionally explore tiny $1^ o$ rotations, to boost the black-box transferability of both existing and ensemble-based attacks. Results show substantial gains on NIPS'17 and, to a lesser extent, CIFAR-10, with up to 803% improvements in single-model transferability and up to 347% in ensemble settings, while a $1^ o$ rotation yields notable gains on NIPS'17 due to induced low-level feature map fluctuations. The findings imply a dataset- and model-dependent optimal angle range (approximately $210^ o$-$240^ o$ degrees under unlimited queries) and highlight a low-cost avenue for refining transferable adversarial attacks, with practical implications for robustness evaluation and defense design.
Abstract
Deep neural networks (DNNs) are highly susceptible to adversarial examples--subtle perturbations applied to inputs that are often imperceptible to humans yet lead to incorrect model predictions. In black-box scenarios, however, existing adversarial examples exhibit limited transferability and struggle to effectively compromise multiple unseen DNN models. Previous strategies enhance the cross-model generalization of adversarial examples by introducing versatility into adversarial perturbations, thereby improving transferability. However, further refining perturbation versatility often demands intricate algorithm development and substantial computation consumption. In this work, we propose an input transpose method that requires almost no additional labor and computation costs but can significantly improve the transferability of existing adversarial strategies. Even without adding adversarial perturbations, our method demonstrates considerable effectiveness in cross-model attacks. Our exploration finds that on specific datasets, a mere $1^\circ$ left or right rotation might be sufficient for most adversarial examples to deceive unseen models. Our further analysis suggests that this transferability improvement triggered by rotating only $1^\circ$ may stem from visible pattern shifts in the DNN's low-level feature maps. Moreover, this transferability exhibits optimal angles that, when identified under unrestricted query conditions, could potentially yield even greater performance.
