Table of Contents
Fetching ...

Boosting Adversarial Transferability with Spatial Adversarial Alignment

Zhaoyu Chen, Haijing Guo, Kaixun Jiang, Jiyuan Fu, Xinyu Zhou, Dingkang Yang, Hao Tang, Bo Li, Wenqiang Zhang

TL;DR

This paper tackles the challenge of transferring adversarial perturbations across architectures, especially CNNs to ViTs, by introducing Spatial Adversarial Alignment (SAA). SAA jointly enforces spatial-aware and adversarial-aware alignment between a surrogate model and a witness model, using global KL-divergence and local per-region supervision, plus a self-adversarial strategy to align adversarial features. The approach fine-tunes the surrogate over a small epoch with no extra data, and its losses are combined into a single objective that guides gradient updates. Empirical results on ImageNet show that SAA yields state-of-the-art cross-architecture transferability, enhances ensemble and feature-based attacks, and remains effective under several defenses, with ablations highlighting the critical roles of local and adversarial alignment.

Abstract

Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models. Numerous approaches are proposed to enhance the transferability of adversarial examples, including advanced optimization, data augmentation, and model modifications. However, these methods still show limited transferability, particularly in cross-architecture scenarios, such as from CNN to ViT. To achieve high transferability, we propose a technique termed Spatial Adversarial Alignment (SAA), which employs an alignment loss and leverages a witness model to fine-tune the surrogate model. Specifically, SAA consists of two key parts: spatial-aware alignment and adversarial-aware alignment. First, we minimize the divergences of features between the two models in both global and local regions, facilitating spatial alignment. Second, we introduce a self-adversarial strategy that leverages adversarial examples to impose further constraints, aligning features from an adversarial perspective. Through this alignment, the surrogate model is trained to concentrate on the common features extracted by the witness model. This facilitates adversarial attacks on these shared features, thereby yielding perturbations that exhibit enhanced transferability. Extensive experiments on various architectures on ImageNet show that aligned surrogate models based on SAA can provide higher transferable adversarial examples, especially in cross-architecture attacks.

Boosting Adversarial Transferability with Spatial Adversarial Alignment

TL;DR

This paper tackles the challenge of transferring adversarial perturbations across architectures, especially CNNs to ViTs, by introducing Spatial Adversarial Alignment (SAA). SAA jointly enforces spatial-aware and adversarial-aware alignment between a surrogate model and a witness model, using global KL-divergence and local per-region supervision, plus a self-adversarial strategy to align adversarial features. The approach fine-tunes the surrogate over a small epoch with no extra data, and its losses are combined into a single objective that guides gradient updates. Empirical results on ImageNet show that SAA yields state-of-the-art cross-architecture transferability, enhances ensemble and feature-based attacks, and remains effective under several defenses, with ablations highlighting the critical roles of local and adversarial alignment.

Abstract

Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models. Numerous approaches are proposed to enhance the transferability of adversarial examples, including advanced optimization, data augmentation, and model modifications. However, these methods still show limited transferability, particularly in cross-architecture scenarios, such as from CNN to ViT. To achieve high transferability, we propose a technique termed Spatial Adversarial Alignment (SAA), which employs an alignment loss and leverages a witness model to fine-tune the surrogate model. Specifically, SAA consists of two key parts: spatial-aware alignment and adversarial-aware alignment. First, we minimize the divergences of features between the two models in both global and local regions, facilitating spatial alignment. Second, we introduce a self-adversarial strategy that leverages adversarial examples to impose further constraints, aligning features from an adversarial perspective. Through this alignment, the surrogate model is trained to concentrate on the common features extracted by the witness model. This facilitates adversarial attacks on these shared features, thereby yielding perturbations that exhibit enhanced transferability. Extensive experiments on various architectures on ImageNet show that aligned surrogate models based on SAA can provide higher transferable adversarial examples, especially in cross-architecture attacks.
Paper Structure (19 sections, 7 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 19 sections, 7 equations, 4 figures, 11 tables, 1 algorithm.

Figures (4)

  • Figure 1: Spatial Adversarial Alignment (SAA) consists of two parts: spatial-aware alignment and adversarial-aware alignment. Initially, we aim to minimize the feature divergences between the two models across both global and local regions, thereby promoting spatial alignment. Subsequently, we introduce a self-adversarial strategy that utilizes adversarial examples to impose additional constraints, aligning the adversarial features.
  • Figure 2: Grad-CAM visualizations comparing the feature distribution of unaligned and aligned surrogate models (Res50) on clean inputs and adversarial examples (generated by SSA-DI-TI-MI).
  • Figure 3: Ablation study on training epochs.
  • Figure 4: Grad-CAM on target models.