Table of Contents
Fetching ...

Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability

Juanjuan Weng, Zhiming Luo, Shaozi Li

TL;DR

Addressing adversarial transferability, the paper shows high-frequency perturbations strongly affect normally trained models while low-frequency information enhances cross-model transferability. It introduces a frequency-decomposition-based feature mixing framework using low-frequency $x_l$ and high-frequency $x_h$ components, coupled with cross-frequency meta-optimization to resolve conflicting effects of AFM and LF-AFM. Across ImageNet-Compatible data, the method delivers superior transferability against both normally trained CNNs and defense models, including transformer architectures. The work provides public code and demonstrates practical implications for evaluating and improving adversarial robustness.

Abstract

Recent studies have shown that Deep Neural Networks (DNNs) are susceptible to adversarial attacks, with frequency-domain analysis underscoring the significance of high-frequency components in influencing model predictions. Conversely, targeting low-frequency components has been effective in enhancing attack transferability on black-box models. In this study, we introduce a frequency decomposition-based feature mixing method to exploit these frequency characteristics in both clean and adversarial samples. Our findings suggest that incorporating features of clean samples into adversarial features extracted from adversarial examples is more effective in attacking normally-trained models, while combining clean features with the adversarial features extracted from low-frequency parts decomposed from the adversarial samples yields better results in attacking defense models. However, a conflict issue arises when these two mixing approaches are employed simultaneously. To tackle the issue, we propose a cross-frequency meta-optimization approach comprising the meta-train step, meta-test step, and final update. In the meta-train step, we leverage the low-frequency components of adversarial samples to boost the transferability of attacks against defense models. Meanwhile, in the meta-test step, we utilize adversarial samples to stabilize gradients, thereby enhancing the attack's transferability against normally trained models. For the final update, we update the adversarial sample based on the gradients obtained from both meta-train and meta-test steps. Our proposed method is evaluated through extensive experiments on the ImageNet-Compatible dataset, affirming its effectiveness in improving the transferability of attacks on both normally-trained CNNs and defense models. The source code is available at https://github.com/WJJLL/MetaSSA.

Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability

TL;DR

Addressing adversarial transferability, the paper shows high-frequency perturbations strongly affect normally trained models while low-frequency information enhances cross-model transferability. It introduces a frequency-decomposition-based feature mixing framework using low-frequency and high-frequency components, coupled with cross-frequency meta-optimization to resolve conflicting effects of AFM and LF-AFM. Across ImageNet-Compatible data, the method delivers superior transferability against both normally trained CNNs and defense models, including transformer architectures. The work provides public code and demonstrates practical implications for evaluating and improving adversarial robustness.

Abstract

Recent studies have shown that Deep Neural Networks (DNNs) are susceptible to adversarial attacks, with frequency-domain analysis underscoring the significance of high-frequency components in influencing model predictions. Conversely, targeting low-frequency components has been effective in enhancing attack transferability on black-box models. In this study, we introduce a frequency decomposition-based feature mixing method to exploit these frequency characteristics in both clean and adversarial samples. Our findings suggest that incorporating features of clean samples into adversarial features extracted from adversarial examples is more effective in attacking normally-trained models, while combining clean features with the adversarial features extracted from low-frequency parts decomposed from the adversarial samples yields better results in attacking defense models. However, a conflict issue arises when these two mixing approaches are employed simultaneously. To tackle the issue, we propose a cross-frequency meta-optimization approach comprising the meta-train step, meta-test step, and final update. In the meta-train step, we leverage the low-frequency components of adversarial samples to boost the transferability of attacks against defense models. Meanwhile, in the meta-test step, we utilize adversarial samples to stabilize gradients, thereby enhancing the attack's transferability against normally trained models. For the final update, we update the adversarial sample based on the gradients obtained from both meta-train and meta-test steps. Our proposed method is evaluated through extensive experiments on the ImageNet-Compatible dataset, affirming its effectiveness in improving the transferability of attacks on both normally-trained CNNs and defense models. The source code is available at https://github.com/WJJLL/MetaSSA.
Paper Structure (25 sections, 9 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 25 sections, 9 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: The average success rates (%) of adversarial attacks are evaluated on four normally trained models and four defense models. Adversarial examples are crafted by employing both AFM and LF-AFM simultaneously during attack iterations using the MI-FGSM. "AFM ($t$) and LF-AFM ($10-t$) on Inc-v3" indicates the use of AFM for the initial $t$ iterations, followed by LF-AFM for the remaining $10-t$ iterations with Inc-v3 regarded as the source model.
  • Figure 2: The cross-frequency meta-optimization process at iteration $t$ comprises three steps: (1) low-frequency guided meta-train, (2) adversarial sample guided meta-test, and (3) final update. In the meta-train step, we sequentially use the Low-Frequency Adversarial Features Mixing (LF-AFM) to compute the current meta-train gradient $g_{tr}$ for crafting temporary adversarial examples $\tilde{x}^{(t, i)}$ at each inner iteration $i$. In the meta-test step, we average the gradients $\nabla_{\tilde{x}^{t,i}}$ using the Adversarial Features Mixing (AFM) across all intermediate $\tilde{x}^{(t, i)}$ generated in the meta-train phase. Finally, for the final update, we craft the adversarial examples $\tilde{x}^{(t+1,0)}$ based on the gradients obtained from both meta-train and meta-test steps.
  • Figure 3: Comparison with different feature mixing strategies.
  • Figure 4: The influence of using different sample quantity $N$ in the cross-frequency meta optimization.
  • Figure 5: Visualization of adversarial images and the corresponding perturbations crafted by various attack methods.
  • ...and 1 more figures