Transferability Bound Theory: Exploring Relationship between Adversarial Transferability and Flatness

Mingyuan Fan; Xiaodan Li; Cen Chen; Wenmeng Zhou; Yaliang Li

Transferability Bound Theory: Exploring Relationship between Adversarial Transferability and Flatness

Mingyuan Fan, Xiaodan Li, Cen Chen, Wenmeng Zhou, Yaliang Li

TL;DR

This work questions the widely held view that flatter adversarial examples naturally yield better transferability by deriving a theoretical bound on transferability and showing that flatness alone is insufficient. It decouples transferability into a local effectiveness term and a transfer-related boundary term, and proves a bound involving inherent model differences, first-order gradients, and second-order gradient components. Building on this theory, the authors propose Theoretically Provable Attack (TPA), which optimizes a first-order gradient-based surrogate of the bound to craft more transferable adversarial examples, avoiding expensive higher-order gradients via a Hessian-free approximation. Empirically, TPA demonstrates superior transferability over state-of-the-art baselines on ImageNet across many models, including transformer architectures, and in several real-world applications (Google Vision, search engines, GPT-4, Claude3), underscoring both the practical impact and the need for robust defenses. Overall, the paper provides a solid theoretical foundation and a practical, scalable attack that can recalibrate how researchers think about flatness and transferability in adversarial settings.

Abstract

A prevailing belief in attack and defense community is that the higher flatness of adversarial examples enables their better cross-model transferability, leading to a growing interest in employing sharpness-aware minimization and its variants. However, the theoretical relationship between the transferability of adversarial examples and their flatness has not been well established, making the belief questionable. To bridge this gap, we embark on a theoretical investigation and, for the first time, derive a theoretical bound for the transferability of adversarial examples with few practical assumptions. Our analysis challenges this belief by demonstrating that the increased flatness of adversarial examples does not necessarily guarantee improved transferability. Moreover, building upon the theoretical analysis, we propose TPA, a Theoretically Provable Attack that optimizes a surrogate of the derived bound to craft adversarial examples. Extensive experiments across widely used benchmark datasets and various real-world applications show that TPA can craft more transferable adversarial examples compared to state-of-the-art baselines. We hope that these results can recalibrate preconceived impressions within the community and facilitate the development of stronger adversarial attack and defense mechanisms. The source codes are available in <https://github.com/fmy266/TPA>.

Transferability Bound Theory: Exploring Relationship between Adversarial Transferability and Flatness

TL;DR

Abstract

Paper Structure (24 sections, 23 equations, 7 figures, 6 tables)

This paper contains 24 sections, 23 equations, 7 figures, 6 tables.

Introduction
Related Work
Flatness-based Optimization Methods
Transferability-enhancing Methods
Theoretical Analysis
The Proposed Approach TPA
Overview
Optimization Formulation
Approximate Solution
Simulation Experiment
Setup
Attack Results
Evaluation in Real World Applications
Ending Remark
Acknowledgments
...and 9 more sections

Figures (7)

Figure 1: The visualization of the first-order gradient, the second-order gradient of $y=\sin x^2$. The black stars symbolize the location where the minimum values of $y_1$ and $y_3$ are achieved.
Figure 2: The attack effectiveness of TPA with varying perturbation budgets. The adversarial examples are crafted on ResNet50.
Figure 3: Example.
Figure 4: We conduct targeted attacks and visualize attention maps of the target model to the resultant adversarial images.
Figure 5: The attack effectiveness of TPA with varying $\lambda \in \{0.1, 0.5, 1, 5, 10\}, b \in \{1,2,4,8,12,16\},k \in \{0.01, 0.03, 0.05, 0.07, 0.09\},N \in \{5,10,15,20\}$. The proxy model is ResNet50. We set $\epsilon=8$.
...and 2 more figures

Transferability Bound Theory: Exploring Relationship between Adversarial Transferability and Flatness

TL;DR

Abstract

Transferability Bound Theory: Exploring Relationship between Adversarial Transferability and Flatness

Authors

TL;DR

Abstract

Table of Contents

Figures (7)