Table of Contents
Fetching ...

On the Adversarial Transferability of Generalized "Skip Connections"

Yisen Wang, Yichuan Mo, Dongxian Wu, Mingjie Li, Xingjun Ma, Zhouchen Lin

TL;DR

The paper reveals that skip connections inherently facilitate the transferability of adversarial examples. It proposes the Skip Gradient Method (SGM), which biases backpropagation toward skip-gradient pathways by applying a decay factor $\gamma\in(0,1]$ to residual modules, without extra computation, and extends this mechanism from ResNet-like CNNs to Vision Transformers, models with length-varying paths, NAS architectures, and even Large Language Models. The authors provide a theoretical account via Alignment between Adversarial and Intrinsic attacks (AAI) showing that SGM can align attack directions more closely with data-distribution–moving directions, and back this with extensive experiments across 31 attacks, numerous architectures, ensembles, and defense scenarios, demonstrating substantial transferability gains. They also supply intuitive visualizations and demonstrate LLM applicability, underscoring architectural characteristics as a critical factor in adversarial robustness and suggesting new directions for secure architectural design.

Abstract

Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. Despite their huge success in normal scenarios (state-of-the-art classification performance on natural examples), we investigate and identify an interesting property of skip connections under adversarial scenarios, namely, the use of skip connections allows easier generation of highly transferable adversarial examples. Specifically, in ResNet-like models (with skip connections), we find that using more gradients from the skip connections rather than the residual modules according to a decay factor during backpropagation allows one to craft adversarial examples with high transferability. The above method is termed as Skip Gradient Method (SGM). Although starting from ResNet-like models in vision domains, we further extend SGM to more advanced architectures, including Vision Transformers (ViTs) and models with length-varying paths and other domains, i.e. natural language processing. We conduct comprehensive transfer attacks against various models including ResNets, Transformers, Inceptions, Neural Architecture Search, and Large Language Models (LLMs). We show that employing SGM can greatly improve the transferability of crafted attacks in almost all cases. Furthermore, considering the big complexity for practical use, we further demonstrate that SGM can even improve the transferability on ensembles of models or targeted attacks and the stealthiness against current defenses. At last, we provide theoretical explanations and empirical insights on how SGM works. Our findings not only motivate new adversarial research into the architectural characteristics of models but also open up further challenges for secure model architecture design. Our code is available at https://github.com/mo666666/SGM.

On the Adversarial Transferability of Generalized "Skip Connections"

TL;DR

The paper reveals that skip connections inherently facilitate the transferability of adversarial examples. It proposes the Skip Gradient Method (SGM), which biases backpropagation toward skip-gradient pathways by applying a decay factor to residual modules, without extra computation, and extends this mechanism from ResNet-like CNNs to Vision Transformers, models with length-varying paths, NAS architectures, and even Large Language Models. The authors provide a theoretical account via Alignment between Adversarial and Intrinsic attacks (AAI) showing that SGM can align attack directions more closely with data-distribution–moving directions, and back this with extensive experiments across 31 attacks, numerous architectures, ensembles, and defense scenarios, demonstrating substantial transferability gains. They also supply intuitive visualizations and demonstrate LLM applicability, underscoring architectural characteristics as a critical factor in adversarial robustness and suggesting new directions for secure architectural design.

Abstract

Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. Despite their huge success in normal scenarios (state-of-the-art classification performance on natural examples), we investigate and identify an interesting property of skip connections under adversarial scenarios, namely, the use of skip connections allows easier generation of highly transferable adversarial examples. Specifically, in ResNet-like models (with skip connections), we find that using more gradients from the skip connections rather than the residual modules according to a decay factor during backpropagation allows one to craft adversarial examples with high transferability. The above method is termed as Skip Gradient Method (SGM). Although starting from ResNet-like models in vision domains, we further extend SGM to more advanced architectures, including Vision Transformers (ViTs) and models with length-varying paths and other domains, i.e. natural language processing. We conduct comprehensive transfer attacks against various models including ResNets, Transformers, Inceptions, Neural Architecture Search, and Large Language Models (LLMs). We show that employing SGM can greatly improve the transferability of crafted attacks in almost all cases. Furthermore, considering the big complexity for practical use, we further demonstrate that SGM can even improve the transferability on ensembles of models or targeted attacks and the stealthiness against current defenses. At last, we provide theoretical explanations and empirical insights on how SGM works. Our findings not only motivate new adversarial research into the architectural characteristics of models but also open up further challenges for secure model architecture design. Our code is available at https://github.com/mo666666/SGM.

Paper Structure

This paper contains 27 sections, 1 theorem, 20 equations, 7 figures, 13 tables.

Key Result

Proposition 1

Consider the following binary-classification residual model as follows: with ${\bm{x}}\in \mathbb{R}^2$,$\hat{{\bm{y}}} \in \mathbb{R}^2$ is the one-hot label vector, and $g({\bm{x}})$ is a residual block with learnable parameters. If the attack is generated with the hinge loss on a certain class: with $y$ as a label. If $\|\nabla_{\bm{x}} g({\bm{x}})\|_F \leq 1$ and $0\leq\frac{\partial^2 g}{\p

Figures (7)

  • Figure 1: Illustration of the last 3 skip connections (green lines) and residual modules (black boxes) of a ImageNet-trained ResNet-18. The success rate ("white-box/black-box") of adversarial attacks crafted using gradients flowing through either a skip connection (b) or a convolution module (c) at each junction point (circle). The attacks are crafted by BIM on 5000 ImageNet validation images under maximum $L_{\infty}$ perturbation $\epsilon = 16$ (pixel values are in $[0,255]$). The black-box success rate is tested against a VGG19 target model.
  • Figure 2: Diagrams for performing SGM during the backpropagation on various prevailing architectures.
  • Figure 3: Parameter tuning: the success rates of black-box attacks crafted by 10-step PGD combined with SGM with varying decay parameter $\gamma \in [0.1, 1.0]$. Each figure represents one kind of source model and the curves represent results against different target models.
  • Figure 4: The sensitivity map of different models against images on the ImageNet dataset. The average confidence (%) of four architectures on the ground truth class is (a) Hammerhead: 99.47. (b) Toilet tissue: 95.38. (c) Espresso: 87.36. (d) Granny Smith: 99.54.
  • Figure 5: SmoothGrad of DenseNet-121 with varying $\gamma$. When we decay the gradient propagated from the skip connections step by step, the features gradually change from local to global.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof