Table of Contents
Fetching ...

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, Hai Jin

TL;DR

This work investigates adversarial transferability through the lens of surrogate-model properties, focusing on two key factors: loss-surface smoothness and gradient similarity. It reveals a transferable 'trade-off' where adversarial training and data distribution shifts can improve one factor while degrading the other, thereby shaping transferability in a budget- and dataset-dependent manner. The authors develop a general route to better surrogates by combining gradient regularization (especially input-space IR/JR) with sharpness-aware minimization (SAM), and show that pairing these with surrogate-independent methods (AE-generation strategies and LGV ensembles) yields substantially improved transferability in experiments on CIFAR-10, ImageNette, and MLaaS platforms. Overall, the paper argues for a unified treatment of smoothness and gradient similarity to understand and optimize adversarial transfer attacks. It also clarifies why small-budget defenses can paradoxically aid transferability and offers practical guidelines for constructing surrogates with robust transfer directions.

Abstract

Adversarial examples (AEs) for DNNs have been shown to be transferable: AEs that successfully fool white-box surrogate models can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable AEs, many of these findings lack explanations and even lead to inconsistent advice. In this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing little robustness phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates, we attribute it to a trade-off between two predominant factors: model smoothness and gradient similarity. Our investigations focus on their joint effects, rather than their separate correlations with transferability. Through a series of theoretical and empirical analyses, we conjecture that the data distribution shift in adversarial training explains the degradation of gradient similarity. Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability. Finally, we provide a general route for constructing better surrogates to boost transferability which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the crucial role of manipulating surrogate models.

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

TL;DR

This work investigates adversarial transferability through the lens of surrogate-model properties, focusing on two key factors: loss-surface smoothness and gradient similarity. It reveals a transferable 'trade-off' where adversarial training and data distribution shifts can improve one factor while degrading the other, thereby shaping transferability in a budget- and dataset-dependent manner. The authors develop a general route to better surrogates by combining gradient regularization (especially input-space IR/JR) with sharpness-aware minimization (SAM), and show that pairing these with surrogate-independent methods (AE-generation strategies and LGV ensembles) yields substantially improved transferability in experiments on CIFAR-10, ImageNette, and MLaaS platforms. Overall, the paper argues for a unified treatment of smoothness and gradient similarity to understand and optimize adversarial transfer attacks. It also clarifies why small-budget defenses can paradoxically aid transferability and offers practical guidelines for constructing surrogates with robust transfer directions.

Abstract

Adversarial examples (AEs) for DNNs have been shown to be transferable: AEs that successfully fool white-box surrogate models can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable AEs, many of these findings lack explanations and even lead to inconsistent advice. In this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing little robustness phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates, we attribute it to a trade-off between two predominant factors: model smoothness and gradient similarity. Our investigations focus on their joint effects, rather than their separate correlations with transferability. Through a series of theoretical and empirical analyses, we conjecture that the data distribution shift in adversarial training explains the degradation of gradient similarity. Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability. Finally, we provide a general route for constructing better surrogates to boost transferability which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the crucial role of manipulating surrogate models.
Paper Structure (29 sections, 5 theorems, 31 equations, 10 figures, 13 tables)

This paper contains 29 sections, 5 theorems, 31 equations, 10 figures, 13 tables.

Key Result

Theorem 1

Given any sample $(x, y) \in \mathcal{D}$, let $x+\delta$ denote a perturbed version of $x$ with fooling probability $\operatorname{Pr}(\mathcal{F}(x+\delta) \neq y ) \geq (1-\alpha)$ and perturbation budget $\|\delta\|_{2} \leq \epsilon$. Then the transferability $\operatorname{Pr}\left(T_{r}(\math and $\epsilon$ be sufficiently large such that $\epsilon > c_{\mathcal{G}}$ and $\epsilon > c_{\mat

Figures (10)

  • Figure 1: A complete overview of the relationship between factors regulating adversarial transferability of our study.
  • Figure 2: Transfer attack success rates (ASRs) against adversarially trained CIFAR-10 and ImageNette classifiers. We plot the average results of different surrogates obtained by 3 random seeds and the corresponding error bars for each $\epsilon$.
  • Figure 4: Average gradient similarities between $\epsilon$-robust models and different target models with corresponding error bars at each $\epsilon$ over 3 different random seeds.
  • Figure 5: Gradient similarities between augmented surrogate and ST target CIFAR-10 models with error bars at each $\tau$.
  • Figure 6: Gradient similarities between augmented surrogate and ST target ImageNette models with error bars at each $\tau$.
  • ...and 5 more figures

Theorems & Definitions (13)

  • Definition 1: Model smoothness
  • Definition 2: Gradient similarity
  • Definition 3: Transferability
  • Theorem 1: Lower bound of transferability
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • ...and 3 more