Table of Contents
Fetching ...

Beyond Deceptive Flatness: Dual-Order Solution for Strengthening Adversarial Transferability

Zhixuan Zhang, Pingyu Wang, Xingjian Zheng, Linbo Qing, Qi Liu

TL;DR

The paper addresses deceptive flatness as a barrier to transferable black-box adversarial attacks and proposes a dual-order framework that fuses zeroth- and first-order loss information via Adversarial Flatness (AF). It then develops Adversarial Flatness Attack (AFA) with gradient-approximation and introduces MonteCarlo Adversarial Sampling (MCAS) to diversify inner-loop exploration, supported by theoretical guarantees of transferability. Empirical evaluation on ImageNet-scale data and a Baidu Cloud API shows that AFA with MCAS outperforms six baselines across diverse model architectures, ensembles, defenses, and when combined with input transformations. The results underscore the value of exploiting flatter regions for transferability while highlighting the need for robust defenses and future work on attention-shift issues in Transformer architectures.

Abstract

Transferable attacks generate adversarial examples on surrogate models to fool unknown victim models, posing real-world threats and growing research interest. Despite focusing on flat losses for transferable adversarial examples, recent studies still fall into suboptimal regions, especially the flat-yet-sharp areas, termed as deceptive flatness. In this paper, we introduce a novel black-box gradient-based transferable attack from a perspective of dual-order information. Specifically, we feasibly propose Adversarial Flatness (AF) to the deceptive flatness problem and a theoretical assurance for adversarial transferability. Based on this, using an efficient approximation of our objective, we instantiate our attack as Adversarial Flatness Attack (AFA), addressing the altered gradient sign issue. Additionally, to further improve the attack ability, we devise MonteCarlo Adversarial Sampling (MCAS) by enhancing the inner-loop sampling efficiency. The comprehensive results on ImageNet-compatible dataset demonstrate superiority over six baselines, generating adversarial examples in flatter regions and boosting transferability across model architectures. When tested on input transformation attacks or the Baidu Cloud API, our method outperforms baselines.

Beyond Deceptive Flatness: Dual-Order Solution for Strengthening Adversarial Transferability

TL;DR

The paper addresses deceptive flatness as a barrier to transferable black-box adversarial attacks and proposes a dual-order framework that fuses zeroth- and first-order loss information via Adversarial Flatness (AF). It then develops Adversarial Flatness Attack (AFA) with gradient-approximation and introduces MonteCarlo Adversarial Sampling (MCAS) to diversify inner-loop exploration, supported by theoretical guarantees of transferability. Empirical evaluation on ImageNet-scale data and a Baidu Cloud API shows that AFA with MCAS outperforms six baselines across diverse model architectures, ensembles, defenses, and when combined with input transformations. The results underscore the value of exploiting flatter regions for transferability while highlighting the need for robust defenses and future work on attention-shift issues in Transformer architectures.

Abstract

Transferable attacks generate adversarial examples on surrogate models to fool unknown victim models, posing real-world threats and growing research interest. Despite focusing on flat losses for transferable adversarial examples, recent studies still fall into suboptimal regions, especially the flat-yet-sharp areas, termed as deceptive flatness. In this paper, we introduce a novel black-box gradient-based transferable attack from a perspective of dual-order information. Specifically, we feasibly propose Adversarial Flatness (AF) to the deceptive flatness problem and a theoretical assurance for adversarial transferability. Based on this, using an efficient approximation of our objective, we instantiate our attack as Adversarial Flatness Attack (AFA), addressing the altered gradient sign issue. Additionally, to further improve the attack ability, we devise MonteCarlo Adversarial Sampling (MCAS) by enhancing the inner-loop sampling efficiency. The comprehensive results on ImageNet-compatible dataset demonstrate superiority over six baselines, generating adversarial examples in flatter regions and boosting transferability across model architectures. When tested on input transformation attacks or the Baidu Cloud API, our method outperforms baselines.

Paper Structure

This paper contains 37 sections, 22 equations, 16 figures, 12 tables, 1 algorithm.

Figures (16)

  • Figure 1: An example showing: (a)(b) the limitations of single-order flatness methods, and (c)(d) the improvements from our dual-order gradient fusion and diversified sampling strategy.
  • Figure 2: (a) The process of the transferable adversarial attack. (b) The overview of generating the adversarial example by our proposed Adversarial Flatness Attack, corresponding to stage 1 in (a). In (b), our attack method consists of MonteCarlo Adversarial Sampling and Dual-order Information Fusion. The dashed red line denotes information fusion flow. The dashed black line indicates the backward gradient from the white-box source model.
  • Figure 3: The attack success rates (%) and adversarial loss of MI and VMI on seven black-box models. The adversarial examples are generated on Inc-v3. The adversarial losses of MI-FGSM and VMI-FGSM are individually represented as MI_AdvLoss and VMI_AdvLoss. The attack success rates of MI-FGSM and VMI-FGSM are denoted as MI_ASR and VMI_ASR respectively.
  • Figure 4: Illustrations of the average gradients of our approximated objective over $T$ iterations (a) without and (b) with adding $g_1$, $g_2$ and $g_3$. The dashed line representing $g_{total}$ is our focus.
  • Figure 5: The attack success rates (%) of our AFA without or with adding $g_1$, $g_2$ and $g_3$ on six black-box models. The adversarial examples are generated on Inc-v3. Incv3$_{ens3}$, Incv3$_{ens4}$ and IncResv2$_{ens}$ are Inception-v3 and Inception-ResNet-v2 using ensemble adversarial training adv-training_ens.
  • ...and 11 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2