Table of Contents
Fetching ...

Improving the Transferability of Adversarial Examples by Feature Augmentation

Donghua Wang, Wen Yao, Tingsong Jiang, Xiaohu Zheng, Junqi Wu, Xiaoqian Chen

TL;DR

Transfer-based adversarial attacks often fail to generalize across models due to architectural discrepancies. The authors introduce FAUG, a simple feature augmentation technique that injects zero-mean Gaussian noise into an intermediate model feature, defined as $\hat{f}^i_\phi = f^i_\phi + \eta$ with $\eta \sim \mathcal{N}(0,\sigma)$, to diversify the attack gradient and boost cross-model transferability without added computation, compatible with gradient-based attacks such as MIFGSM through the standard update $x^{(t+1)}_{adv}=x^{(t)}_{adv}+\alpha\text{sign}(g_{t+1})$, $g_{t+1}=\xi g_t + \frac{\nabla_x \mathcal{L}(\hat{f}_\phi(x^{(t)}_{adv}), y)}{||\nabla_x \mathcal{L}(\hat{f}_\phi(x^{(t)}_{adv}), y)||_1}$. Extensive ImageNet experiments across CNNs and Vision Transformers show that FAUG improves average black-box transferability (e.g., 59.96% vs 52.39% baselines) and yields notable gains when combined with advanced gradient attacks and ensemble strategies, while ablations highlight the importance of layer selection and noise strength. The work suggests FAUG as a lightweight, broadly compatible method to enhance adversarial transferability and informs defense considerations and adversarial training opportunities.

Abstract

Despite the success of input transformation-based attacks on boosting adversarial transferability, the performance is unsatisfying due to the ignorance of the discrepancy across models. In this paper, we propose a simple but effective feature augmentation attack (FAUG) method, which improves adversarial transferability without introducing extra computation costs. Specifically, we inject the random noise into the intermediate features of the model to enlarge the diversity of the attack gradient, thereby mitigating the risk of overfitting to the specific model and notably amplifying adversarial transferability. Moreover, our method can be combined with existing gradient attacks to augment their performance further. Extensive experiments conducted on the ImageNet dataset across CNN and transformer models corroborate the efficacy of our method, e.g., we achieve improvement of +26.22% and +5.57% on input transformation-based attacks and combination methods, respectively.

Improving the Transferability of Adversarial Examples by Feature Augmentation

TL;DR

Transfer-based adversarial attacks often fail to generalize across models due to architectural discrepancies. The authors introduce FAUG, a simple feature augmentation technique that injects zero-mean Gaussian noise into an intermediate model feature, defined as with , to diversify the attack gradient and boost cross-model transferability without added computation, compatible with gradient-based attacks such as MIFGSM through the standard update , . Extensive ImageNet experiments across CNNs and Vision Transformers show that FAUG improves average black-box transferability (e.g., 59.96% vs 52.39% baselines) and yields notable gains when combined with advanced gradient attacks and ensemble strategies, while ablations highlight the importance of layer selection and noise strength. The work suggests FAUG as a lightweight, broadly compatible method to enhance adversarial transferability and informs defense considerations and adversarial training opportunities.

Abstract

Despite the success of input transformation-based attacks on boosting adversarial transferability, the performance is unsatisfying due to the ignorance of the discrepancy across models. In this paper, we propose a simple but effective feature augmentation attack (FAUG) method, which improves adversarial transferability without introducing extra computation costs. Specifically, we inject the random noise into the intermediate features of the model to enlarge the diversity of the attack gradient, thereby mitigating the risk of overfitting to the specific model and notably amplifying adversarial transferability. Moreover, our method can be combined with existing gradient attacks to augment their performance further. Extensive experiments conducted on the ImageNet dataset across CNN and transformer models corroborate the efficacy of our method, e.g., we achieve improvement of +26.22% and +5.57% on input transformation-based attacks and combination methods, respectively.
Paper Structure (17 sections, 3 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 3 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Influence of the type of feature augmentation on ResNet50 (abbr.RN50) performance.
  • Figure 2: Influence of layer selection on attack performance.
  • Figure 3: Influence of the type of feature augmentation on attack performance.
  • Figure 4: Influence of random noise strength on attack performance at the specific layer.