Table of Contents
Fetching ...

Boosting Adversarial Transferability with Low-Cost Optimization via Maximin Expected Flatness

Chunlin Qiu, Ang Li, Yiheng Duan, Shenyi Zhang, Yuanjie Zhang, Lingchen Zhao, Qian Wang

TL;DR

This work addresses the vulnerability of deep nets to transfer-based adversarial attacks by providing a principled theory of loss-surface flatness and its connection to transferability. It unifies fragmented flatness notions into a multi-order framework and proves that zeroth-order average flatness governs cross-model transferability, enabling efficient attack design. Building on this theory, the authors introduce Maximin Expected Flatness (MEF), which combines Neighborhood Conditional Sampling and Gradient Balancing Optimization to maximize flatness-aware perturbations with reduced computation. Empirically, MEF achieves state-of-the-art transferability across 22 models and 24 attacks, half the cost of prior leaders, and strong robustness against defenses and real-world APIs, underscoring the practical threat of geometry-aware attacks. The work advances both the theoretical understanding of flatness in adversarial transfer and a practical, efficient attack method with broad implications for AI security.

Abstract

Transfer-based attacks craft adversarial examples on white-box surrogate models and directly deploy them against black-box target models, offering model-agnostic and query-free threat scenarios. While flatness-enhanced methods have recently emerged to improve transferability by enhancing the loss surface flatness of adversarial examples, their divergent flatness definitions and heuristic attack designs suffer from unexamined optimization limitations and missing theoretical foundation, thus constraining their effectiveness and efficiency. This work exposes the severely imbalanced exploitation-exploration dynamics in flatness optimization, establishing the first theoretical foundation for flatness-based transferability and proposing a principled framework to overcome these optimization pitfalls. Specifically, we systematically unify fragmented flatness definitions across existing methods, revealing their imbalanced optimization limitations in over-exploration of sensitivity peaks or over-exploitation of local plateaus. To resolve these issues, we rigorously formalize average-case flatness and transferability gaps, proving that enhancing zeroth-order average-case flatness minimizes cross-model discrepancies. Building on this theory, we design a Maximin Expected Flatness (MEF) attack that enhances zeroth-order average-case flatness while balancing flatness exploration and exploitation. Extensive evaluations across 22 models and 24 current transfer-based attacks demonstrate MEF's superiority: it surpasses the state-of-the-art PGN attack by 4% in attack success rate at half the computational cost and achieves 8% higher success rate under the same budget. When combined with input augmentation, MEF attains 15% additional gains against defense-equipped models, establishing new robustness benchmarks. Our code is available at https://github.com/SignedQiu/MEFAttack.

Boosting Adversarial Transferability with Low-Cost Optimization via Maximin Expected Flatness

TL;DR

This work addresses the vulnerability of deep nets to transfer-based adversarial attacks by providing a principled theory of loss-surface flatness and its connection to transferability. It unifies fragmented flatness notions into a multi-order framework and proves that zeroth-order average flatness governs cross-model transferability, enabling efficient attack design. Building on this theory, the authors introduce Maximin Expected Flatness (MEF), which combines Neighborhood Conditional Sampling and Gradient Balancing Optimization to maximize flatness-aware perturbations with reduced computation. Empirically, MEF achieves state-of-the-art transferability across 22 models and 24 attacks, half the cost of prior leaders, and strong robustness against defenses and real-world APIs, underscoring the practical threat of geometry-aware attacks. The work advances both the theoretical understanding of flatness in adversarial transfer and a practical, efficient attack method with broad implications for AI security.

Abstract

Transfer-based attacks craft adversarial examples on white-box surrogate models and directly deploy them against black-box target models, offering model-agnostic and query-free threat scenarios. While flatness-enhanced methods have recently emerged to improve transferability by enhancing the loss surface flatness of adversarial examples, their divergent flatness definitions and heuristic attack designs suffer from unexamined optimization limitations and missing theoretical foundation, thus constraining their effectiveness and efficiency. This work exposes the severely imbalanced exploitation-exploration dynamics in flatness optimization, establishing the first theoretical foundation for flatness-based transferability and proposing a principled framework to overcome these optimization pitfalls. Specifically, we systematically unify fragmented flatness definitions across existing methods, revealing their imbalanced optimization limitations in over-exploration of sensitivity peaks or over-exploitation of local plateaus. To resolve these issues, we rigorously formalize average-case flatness and transferability gaps, proving that enhancing zeroth-order average-case flatness minimizes cross-model discrepancies. Building on this theory, we design a Maximin Expected Flatness (MEF) attack that enhances zeroth-order average-case flatness while balancing flatness exploration and exploitation. Extensive evaluations across 22 models and 24 current transfer-based attacks demonstrate MEF's superiority: it surpasses the state-of-the-art PGN attack by 4% in attack success rate at half the computational cost and achieves 8% higher success rate under the same budget. When combined with input augmentation, MEF attains 15% additional gains against defense-equipped models, establishing new robustness benchmarks. Our code is available at https://github.com/SignedQiu/MEFAttack.
Paper Structure (55 sections, 1 theorem, 19 equations, 13 figures, 19 tables, 1 algorithm)

This paper contains 55 sections, 1 theorem, 19 equations, 13 figures, 19 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{F}$ and $\mathcal{F'}$ be two models with loss $J(\mathbf x,y;\,\cdot\,)$. Assume $J$ is locally approximable by a Taylor expansion of order $N$. For $0 \le n \le N$, define and let $\bar{R}^{(n)}_{\xi}(\mathbf x;F)$ and $\bar{R}^{(n)}_{\xi}(\mathbf x;F')$ be the $n$-order flatness for $F$ and $F'$. Then for perturbation $\|\boldsymbol\delta\|\le\xi$, the adversarial transferability

Figures (13)

  • Figure 1: Visualization of adversarial loss landscapes for nine attacks on Res-50 resnet. The loss surfaces are constructed by perturbing adversarial examples along two random directions. Our MEF attack achieves the flattest loss landscape, indicating superior flatness-based transferability.
  • Figure 2: Optimization dynamics under flatness duality ($\overline{R}^{(n)}_\xi$ vs. $\widehat{R}^{(n)}_\xi$). Comparing zeroth/first-order methods on 100 ImageNet samples imagenet, we measure inter-update gradient similarity and transfer attack success rates (Res-50 resnet$\rightarrow$Inc-v3 incv3). Post ASR convergence, worst-case variants sustain lower gradient similarity (0.10-0.25) than average-case ones (0.38-0.88), revealing exploration-exploitation trade-off governed by flatness formalism.
  • Figure 3: Overview of the proposed MEF attack. (a) Illustration of the iterative optimization process. At step $t-1$, Neighborhood Conditional Sampling (NCS) transforms random samples ($x^r$) into worst-case neighborhood samples ($x^n$) via inner minimization. Outer maximization then aggregates gradients from these regions to update the adversarial example $x_t^{adv}$. (b) Geometric interpretation showing how MEF seeks flat maxima to minimize the Adversarial Transferability Gap (ATG) between the source model $F$ (solid line) and the target model $F'$ (dashed line), as opposed to sharp maxima which lead to larger gaps.
  • Figure 4: GBO's optimization superiority in transferability (Res-50 resnet$\rightarrow$Inc-v3 incv3). Achieving 90-98% transfer attack success rates within 10 iterations, GBO outperforms PGD's 84-90% at 100 steps with 6-14% absolute improvement, while attaining 10× faster convergence on 100 ImageNet samples.
  • Figure 5: Attack Success Rate (ASR) convergence curve over 100 iterations (Source: Res-50 resnet$\rightarrow$ Target: Inc-v3 incv3). The dashed red line marks the conventional $T=10$ iteration cutoff.
  • ...and 8 more figures

Theorems & Definitions (4)

  • Definition 1: $\xi$-radius $n$-order Flatness
  • Definition 2: Adversarial Transferability Gap
  • Theorem 1: Flatness-based Bound on Transferability
  • proof : Proof of Theorem \ref{['thm:flatness-transferability']}