Enhancing Adversarial Transferability with Adversarial Weight Tuning

Jiahao Chen; Zhou Feng; Rui Zeng; Yuwen Pu; Chunyi Zhou; Yi Jiang; Yuyou Gan; Jinbao Li; Shouling Ji

Enhancing Adversarial Transferability with Adversarial Weight Tuning

Jiahao Chen, Zhou Feng, Rui Zeng, Yuwen Pu, Chunyi Zhou, Yi Jiang, Yuyou Gan, Jinbao Li, Shouling Ji

TL;DR

This work addresses adversarial example transferability across models with different architectures by proposing Adversarial Weight Tuning (AWT), a data-free bi-level optimization framework that jointly perturbs inputs and tunes surrogate-model parameters to create flatter, more transferable loss landscapes. The authors establish theoretical links between transferability, model smoothness, and flat local maxima, and they operationalize these insights through AWT, which minimizes a combined loss that promotes flatness in both input and parameter spaces without requiring external data. Extensive ImageNet-scale experiments show that AWT improves transferability on both CNN- and Transformer-based models, surpassing state-of-the-art gradient-based attacks and enhancing other attacks when combined with AWT. They also introduce a transferability metric and discuss its relationship to empirical results, while acknowledging limitations and practical considerations in real-world scenarios.

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs) that mislead the model while appearing benign to human observers. A critical concern is the transferability of AEs, which enables black-box attacks without direct access to the target model. However, many previous attacks have failed to explain the intrinsic mechanism of adversarial transferability. In this paper, we rethink the property of transferable AEs and reformulate the formulation of transferability. Building on insights from this mechanism, we analyze the generalization of AEs across models with different architectures and prove that we can find a local perturbation to mitigate the gap between surrogate and target models. We further establish the inner connections between model smoothness and flat local maxima, both of which contribute to the transferability of AEs. Further, we propose a new adversarial attack algorithm, \textbf{A}dversarial \textbf{W}eight \textbf{T}uning (AWT), which adaptively adjusts the parameters of the surrogate model using generated AEs to optimize the flat local maxima and model smoothness simultaneously, without the need for extra data. AWT is a data-free tuning method that combines gradient-based and model-based attack methods to enhance the transferability of AEs. Extensive experiments on a variety of models with different architectures on ImageNet demonstrate that AWT yields superior performance over other attacks, with an average increase of nearly 5\% and 10\% attack success rates on CNN-based and Transformer-based models, respectively, compared to state-of-the-art attacks. Code available at https://github.com/xaddwell/AWT.

Enhancing Adversarial Transferability with Adversarial Weight Tuning

TL;DR

Abstract

Paper Structure (21 sections, 4 theorems, 34 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 4 theorems, 34 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Gradient-based Adversarial Attack
Model-based Adversarial Attack
Others
Methodology
Preliminaries
Enhance Transferability with Local (Min)Maxima
Evaluation
Experimental Settings
Measurement of Adversarial Transferability
Experimental Results
Conclusion
Acknowledgement
Proof of the Proposition
...and 6 more sections

Key Result

Proposition 1

Given a model $f_{\theta}$ with parameters $\theta$, and a perturbation $\eta$ such that $\|\eta\|_{p} \leq \kappa$ ($\kappa \rightarrow 0$), we want to prove that $\forall x\in\mathcal{D}$ and $\gamma \rightarrow 0$, there exists an perturbation $\delta$ such that:

Figures (4)

Figure 1: The motivation of the proposed AWT that achieves flat local minima via adjusting the surrogate model’s parameters $\theta^{s}$ to enable more transferable AEs. This plot illustrates that we can achieve low $\nabla_{x^{\prime}}\ell(x^{\prime},y;\theta^{s})$ through the input and parameter space. The colored dots represent the generated AEs with different loss value on the target model $\theta^{t}$.
Figure 2: Correlation between the $L_2$ norms of sample and parameter gradient (with normalization: $g=\frac{g-\mu}{\sigma}$).
Figure 3: Relationship between the proposed metric and experimental transferability on VGG16 (target model). Each dot represents an adversarial sample. Note that AEs are generated on RN50 and the amplitude $\varepsilon$ of the parameter perturbation is set to $0.05$.
Figure 4: Correlation between the $L_2$ norms of sample and parameter gradient (with normalization: $g=\frac{g-\mu}{\sigma}$). The first row includes models: VGG16, ResNet50 and InceptionV4. The second row includes models: Swin-s, ConVit and DeiT-B.

Theorems & Definitions (4)

Proposition 1
Proposition 2
Proposition 1
Proposition 2

Enhancing Adversarial Transferability with Adversarial Weight Tuning

TL;DR

Abstract

Enhancing Adversarial Transferability with Adversarial Weight Tuning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (4)