Improving the Transferability of Adversarial Examples by Feature Augmentation
Donghua Wang, Wen Yao, Tingsong Jiang, Xiaohu Zheng, Junqi Wu, Xiaoqian Chen
TL;DR
Transfer-based adversarial attacks often fail to generalize across models due to architectural discrepancies. The authors introduce FAUG, a simple feature augmentation technique that injects zero-mean Gaussian noise into an intermediate model feature, defined as $\hat{f}^i_\phi = f^i_\phi + \eta$ with $\eta \sim \mathcal{N}(0,\sigma)$, to diversify the attack gradient and boost cross-model transferability without added computation, compatible with gradient-based attacks such as MIFGSM through the standard update $x^{(t+1)}_{adv}=x^{(t)}_{adv}+\alpha\text{sign}(g_{t+1})$, $g_{t+1}=\xi g_t + \frac{\nabla_x \mathcal{L}(\hat{f}_\phi(x^{(t)}_{adv}), y)}{||\nabla_x \mathcal{L}(\hat{f}_\phi(x^{(t)}_{adv}), y)||_1}$. Extensive ImageNet experiments across CNNs and Vision Transformers show that FAUG improves average black-box transferability (e.g., 59.96% vs 52.39% baselines) and yields notable gains when combined with advanced gradient attacks and ensemble strategies, while ablations highlight the importance of layer selection and noise strength. The work suggests FAUG as a lightweight, broadly compatible method to enhance adversarial transferability and informs defense considerations and adversarial training opportunities.
Abstract
Despite the success of input transformation-based attacks on boosting adversarial transferability, the performance is unsatisfying due to the ignorance of the discrepancy across models. In this paper, we propose a simple but effective feature augmentation attack (FAUG) method, which improves adversarial transferability without introducing extra computation costs. Specifically, we inject the random noise into the intermediate features of the model to enlarge the diversity of the attack gradient, thereby mitigating the risk of overfitting to the specific model and notably amplifying adversarial transferability. Moreover, our method can be combined with existing gradient attacks to augment their performance further. Extensive experiments conducted on the ImageNet dataset across CNN and transformer models corroborate the efficacy of our method, e.g., we achieve improvement of +26.22% and +5.57% on input transformation-based attacks and combination methods, respectively.
