Boosting Adversarial Transferability by Block Shuffle and Rotation
Kunyu Wang, Xuanran He, Wenxuan Wang, Xiaosen Wang
TL;DR
This work tackles the transferability gap in adversarial attacks by examining attention heatmaps and showing that existing input-transformations induce model-specific heatmap variability. It introduces Block Shuffle and Rotation (BSR), a transformation that splits images into blocks, shuffles and rotates them, and computes the average gradient over multiple transformed views to destabilize intrinsic image relations and align heatmaps across models. Empirical results on ImageNet demonstrate that BSR substantially improves transferability over state-of-the-art input-transform-based attacks in both single-model and ensemble-model settings, and remains effective against several defenses; it also synergizes with other input transformations. The approach reveals a practical security risk in black-box scenarios and offers a general framework for enhancing transferability by stabilizing cross-model attention patterns, with code available publicly.
Abstract
Adversarial examples mislead deep neural networks with imperceptible perturbations and have brought significant threats to deep learning. An important aspect is their transferability, which refers to their ability to deceive other models, thus enabling attacks in the black-box setting. Though various methods have been proposed to boost transferability, the performance still falls short compared with white-box attacks. In this work, we observe that existing input transformation based attacks, one of the mainstream transfer-based attacks, result in different attention heatmaps on various models, which might limit the transferability. We also find that breaking the intrinsic relation of the image can disrupt the attention heatmap of the original image. Based on this finding, we propose a novel input transformation based attack called block shuffle and rotation (BSR). Specifically, BSR splits the input image into several blocks, then randomly shuffles and rotates these blocks to construct a set of new images for gradient calculation. Empirical evaluations on the ImageNet dataset demonstrate that BSR could achieve significantly better transferability than the existing input transformation based methods under single-model and ensemble-model settings. Combining BSR with the current input transformation method can further improve the transferability, which significantly outperforms the state-of-the-art methods. Code is available at https://github.com/Trustworthy-AI-Group/BSR
