Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

Yechao Zhang; Shengshan Hu; Leo Yu Zhang; Junyu Shi; Minghui Li; Xiaogeng Liu; Wei Wan; Hai Jin

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, Hai Jin

TL;DR

This work investigates adversarial transferability through the lens of surrogate-model properties, focusing on two key factors: loss-surface smoothness and gradient similarity. It reveals a transferable 'trade-off' where adversarial training and data distribution shifts can improve one factor while degrading the other, thereby shaping transferability in a budget- and dataset-dependent manner. The authors develop a general route to better surrogates by combining gradient regularization (especially input-space IR/JR) with sharpness-aware minimization (SAM), and show that pairing these with surrogate-independent methods (AE-generation strategies and LGV ensembles) yields substantially improved transferability in experiments on CIFAR-10, ImageNette, and MLaaS platforms. Overall, the paper argues for a unified treatment of smoothness and gradient similarity to understand and optimize adversarial transfer attacks. It also clarifies why small-budget defenses can paradoxically aid transferability and offers practical guidelines for constructing surrogates with robust transfer directions.

Abstract

Adversarial examples (AEs) for DNNs have been shown to be transferable: AEs that successfully fool white-box surrogate models can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable AEs, many of these findings lack explanations and even lead to inconsistent advice. In this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing little robustness phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates, we attribute it to a trade-off between two predominant factors: model smoothness and gradient similarity. Our investigations focus on their joint effects, rather than their separate correlations with transferability. Through a series of theoretical and empirical analyses, we conjecture that the data distribution shift in adversarial training explains the degradation of gradient similarity. Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability. Finally, we provide a general route for constructing better surrogates to boost transferability which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the crucial role of manipulating surrogate models.

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

TL;DR

Abstract

Paper Structure (29 sections, 5 theorems, 31 equations, 10 figures, 13 tables)

This paper contains 29 sections, 5 theorems, 31 equations, 10 figures, 13 tables.

Introduction
Explaining Little Robustness
Transferability Circuit of Adversarial Training
Lower Bound of Transferability
Trade-off Under Adversarial Training
Investigating Data Augmentation
Data Augmentation Mechanisms
Data Augmentation Impairs Similarity
Trade-off Under Data Augmentation
Investigating Gradient Regularization
Gradient Regularization Mechanisms
Regularization in the Input Space
Regularization in the Weight Space
Gradient Regularization Promotes Smoothness
Trade-off Under Gradient Regularization
...and 14 more sections

Key Result

Theorem 1

Given any sample $(x, y) \in \mathcal{D}$, let $x+\delta$ denote a perturbed version of $x$ with fooling probability $\operatorname{Pr}(\mathcal{F}(x+\delta) \neq y ) \geq (1-\alpha)$ and perturbation budget $\|\delta\|_{2} \leq \epsilon$. Then the transferability $\operatorname{Pr}\left(T_{r}(\math and $\epsilon$ be sufficiently large such that $\epsilon > c_{\mathcal{G}}$ and $\epsilon > c_{\mat

Figures (10)

Figure 1: A complete overview of the relationship between factors regulating adversarial transferability of our study.
Figure 2: Transfer attack success rates (ASRs) against adversarially trained CIFAR-10 and ImageNette classifiers. We plot the average results of different surrogates obtained by 3 random seeds and the corresponding error bars for each $\epsilon$.
Figure 4: Average gradient similarities between $\epsilon$-robust models and different target models with corresponding error bars at each $\epsilon$ over 3 different random seeds.
Figure 5: Gradient similarities between augmented surrogate and ST target CIFAR-10 models with error bars at each $\tau$.
Figure 6: Gradient similarities between augmented surrogate and ST target ImageNette models with error bars at each $\tau$.
...and 5 more figures

Theorems & Definitions (13)

Definition 1: Model smoothness
Definition 2: Gradient similarity
Definition 3: Transferability
Theorem 1: Lower bound of transferability
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
...and 3 more

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

TL;DR

Abstract

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (13)