Table of Contents
Fetching ...

Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

Junqi Gao, Biqing Qi, Yao Li, Zhichang Guo, Dong Li, Yuming Xing, Dazhi Zhang

TL;DR

This work investigates targeted adversarial transferability by identifying High-Sample-Density-Regions (HSDR) as regions where different models behave consistently. It shows that perturbations toward the HSDR of the target class yield more transferable targeted attacks, aided by the observation that easy samples with low loss tend to lie in HSDR, enabling density-free guidance. The authors introduce ESMA, a multi-target, generative attack that uses a two-stage training strategy: pre-trained latent-feature embeddings guided by manifold-matching, and a Unet-based generator trained to map source samples toward easy-target anchors, allowing a single model to attack all classes with reduced storage and computation. Empirical results on ImageNet demonstrate ESMA’s superior targeted transferability compared to the state-of-the-art TTP, along with substantial efficiency gains, supported by ablations that validate the components and the Loss+Gradnorm screening concept. Overall, the work provides both practical attack tooling and theoretical insight into model output consistency in HSDR, with implications for dataset design and defense research.

Abstract

The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios. Further theoretical and experimental verification demonstrates that easy samples with low loss are more likely to be located in HSDR. Perturbations towards such easy samples in the target class can avoid density estimation for HSDR location. Based on the above facts, we verified that adding perturbations to easy samples in the target class improves targeted adversarial transferability of existing attack methods. A generative targeted attack strategy named Easy Sample Matching Attack (ESMA) is proposed, which has a higher success rate for targeted attacks and outperforms the SOTA generative method. Moreover, ESMA requires only 5% of the storage space and much less computation time comparing to the current SOTA, as ESMA attacks all classes with only one model instead of seperate models for each class. Our code is available at https://github.com/gjq100/ESMA.

Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

TL;DR

This work investigates targeted adversarial transferability by identifying High-Sample-Density-Regions (HSDR) as regions where different models behave consistently. It shows that perturbations toward the HSDR of the target class yield more transferable targeted attacks, aided by the observation that easy samples with low loss tend to lie in HSDR, enabling density-free guidance. The authors introduce ESMA, a multi-target, generative attack that uses a two-stage training strategy: pre-trained latent-feature embeddings guided by manifold-matching, and a Unet-based generator trained to map source samples toward easy-target anchors, allowing a single model to attack all classes with reduced storage and computation. Empirical results on ImageNet demonstrate ESMA’s superior targeted transferability compared to the state-of-the-art TTP, along with substantial efficiency gains, supported by ablations that validate the components and the Loss+Gradnorm screening concept. Overall, the work provides both practical attack tooling and theoretical insight into model output consistency in HSDR, with implications for dataset design and defense research.

Abstract

The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios. Further theoretical and experimental verification demonstrates that easy samples with low loss are more likely to be located in HSDR. Perturbations towards such easy samples in the target class can avoid density estimation for HSDR location. Based on the above facts, we verified that adding perturbations to easy samples in the target class improves targeted adversarial transferability of existing attack methods. A generative targeted attack strategy named Easy Sample Matching Attack (ESMA) is proposed, which has a higher success rate for targeted attacks and outperforms the SOTA generative method. Moreover, ESMA requires only 5% of the storage space and much less computation time comparing to the current SOTA, as ESMA attacks all classes with only one model instead of seperate models for each class. Our code is available at https://github.com/gjq100/ESMA.
Paper Structure (32 sections, 7 theorems, 60 equations, 11 figures, 6 tables, 3 algorithms)

This paper contains 32 sections, 7 theorems, 60 equations, 11 figures, 6 tables, 3 algorithms.

Key Result

Theorem 1

For a target class $j\in\left[K\right]$, and two different parametrized class $\mathcal{F}_{{\boldsymbol w}_1}:=\left\{f_{{\boldsymbol w}_1}:{{\boldsymbol w}_1}\in{\mathcal{W}}_1 \right\}$, $\mathcal{F}_{{\boldsymbol w_2}}:=\left\{f_{{\boldsymbol w_2}}:{{\boldsymbol w_2}}\in{\mathcal{W}_2} \right\}$

Figures (11)

  • Figure 1: A schematic example of our motivation, plotting the probability density (darker the color represents larger the density) and samples for three populations (orange, cyan, and green). The black line indicates the Bayesian discriminant boundary.
  • Figure 2: (a): Bayesian discriminant region, darker the color indicate higher the confidence probability. (b): Classifier discriminant region, the probability density curves of the two population distributions are plotted, the white part represents the low-density region of ground truth joint distribution, and we boxed out the small pits in the Bayesian misclassified region. (c): Classifier discriminant region with samples. We boxed an outlier. (d): Output differences between three different classifiers, darker purple indicates greater difference in output between different classifiers
  • Figure 3: The first figure depicts the difference in output of three models under different local sample densities $\rho_{(y_i,x_i,r)}$ divided into different bins. The second figure shows the local empirical risk $R_{(y_i,x_i,r)}$ of samples under different sum of loss and gradient norms (Loss+Gradnorm). For Loss+Gradnorm, we first normalize both variables separately and then add them up to eliminate magnitude differences. The third figure represents the local empirical risk of local sample densities in different values. The fourth figure displays the local density under different Loss+Gradnorms. The neighborhood radius $r$ is taken as $0.4$.
  • Figure 4: Training strategy of ESMA.
  • Figure 5: Comparison of targeted attack transfer success rates with (w) pre-trained embeddings and without (w/o) pre-trained embeddings at different training epochs. Src:Res50.
  • ...and 6 more figures

Theorems & Definitions (9)

  • Definition 1: $(j,x_0,r)$-Local sample density
  • Theorem 1: Local output consistency
  • Theorem 2: Optimization relies on local density
  • Proposition 1
  • Definition 2: $(j,k,x_0,r)$-Local Output Rademacher Complexity
  • Lemma 1
  • Lemma 2: $l_\infty$ Contraction Inequality 37
  • Lemma 3
  • Lemma 4