Table of Contents
Fetching ...

Improving Transferable Targeted Attacks with Feature Tuning Mixup

Kaisheng Liang, Xuelong Dai, Yanjie Li, Dong Wang, Bin Xiao

TL;DR

This work introduces Feature Tuning Mixup (FTM), a method that learns attack-specific feature perturbations in the multi-layer feature space to improve transferable targeted adversarial examples. By combining learnable perturbations with random clean features and employing a momentum-based stochastic update, FTM enhances cross-model transferability with modest computational overhead. An ensemble extension (FTM-E) further boosts effectiveness across diverse architectures, including CNNs and Vision Transformers, demonstrated on an ImageNet-compatible dataset and in scenarios involving multimodal LLMs. The approach outperforms state-of-the-art baselines (notably CFM) and offers a practical, scalable avenue for robust targeted transfer attacks.

Abstract

Deep neural networks (DNNs) exhibit vulnerability to adversarial examples that can transfer across different DNN models. A particularly challenging problem is developing transferable targeted attacks that can mislead DNN models into predicting specific target classes. While various methods have been proposed to enhance attack transferability, they often incur substantial computational costs while yielding limited improvements. Recent clean feature mixup methods use random clean features to perturb the feature space but lack optimization for disrupting adversarial examples, overlooking the advantages of attack-specific perturbations. In this paper, we propose Feature Tuning Mixup (FTM), a novel method that enhances targeted attack transferability by combining both random and optimized noises in the feature space. FTM introduces learnable feature perturbations and employs an efficient stochastic update strategy for optimization. These learnable perturbations facilitate the generation of more robust adversarial examples with improved transferability. We further demonstrate that attack performance can be enhanced through an ensemble of multiple FTM-perturbed surrogate models. Extensive experiments on the ImageNet-compatible dataset across various DNN models demonstrate that our method achieves significant improvements over state-of-the-art methods while maintaining low computational cost.

Improving Transferable Targeted Attacks with Feature Tuning Mixup

TL;DR

This work introduces Feature Tuning Mixup (FTM), a method that learns attack-specific feature perturbations in the multi-layer feature space to improve transferable targeted adversarial examples. By combining learnable perturbations with random clean features and employing a momentum-based stochastic update, FTM enhances cross-model transferability with modest computational overhead. An ensemble extension (FTM-E) further boosts effectiveness across diverse architectures, including CNNs and Vision Transformers, demonstrated on an ImageNet-compatible dataset and in scenarios involving multimodal LLMs. The approach outperforms state-of-the-art baselines (notably CFM) and offers a practical, scalable avenue for robust targeted transfer attacks.

Abstract

Deep neural networks (DNNs) exhibit vulnerability to adversarial examples that can transfer across different DNN models. A particularly challenging problem is developing transferable targeted attacks that can mislead DNN models into predicting specific target classes. While various methods have been proposed to enhance attack transferability, they often incur substantial computational costs while yielding limited improvements. Recent clean feature mixup methods use random clean features to perturb the feature space but lack optimization for disrupting adversarial examples, overlooking the advantages of attack-specific perturbations. In this paper, we propose Feature Tuning Mixup (FTM), a novel method that enhances targeted attack transferability by combining both random and optimized noises in the feature space. FTM introduces learnable feature perturbations and employs an efficient stochastic update strategy for optimization. These learnable perturbations facilitate the generation of more robust adversarial examples with improved transferability. We further demonstrate that attack performance can be enhanced through an ensemble of multiple FTM-perturbed surrogate models. Extensive experiments on the ImageNet-compatible dataset across various DNN models demonstrate that our method achieves significant improvements over state-of-the-art methods while maintaining low computational cost.

Paper Structure

This paper contains 19 sections, 8 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: Effectiveness and efficiency of targeted attacks. Average attack success rates on 14 black-box models, along with the computation time required to generate an adversarial example. Our methods demonstrate superior performance with low computational cost, surpassing state-of-the-art methods.
  • Figure 2: Overview of our FTM attack. In the forward pass, the learnable perturbations are added to the output features. Only a small portion of network layers are randomly selected for perturbation updates and clean feature mixup in each attack iteration. In the backward pass, the gradients of the selected perturbations and adversarial image are computed together for update.
  • Figure 3: Targeted attack success rates (%) based on the number of iterations. The source model is RN-50 and Inc-v3, respectively.
  • Figure 4: Attack success rates with different mixing ratio $\beta$. Each color represents the attack effectiveness on a specific target model.
  • Figure 5: Attack performance and computational costs with different numbers of ensemble models. Left: Our FTM vs. CFM using ensemble strategy. Right: Computational costs of our FTM and RAP. -ens2 means using 2 perturbed models, and so on.
  • ...and 3 more figures