Table of Contents
Fetching ...

Megatron: Evasive Clean-Label Backdoor Attacks against Vision Transformer

Xueluan Gong, Bowei Tian, Meng Xue, Shuike Li, Yanjiao Chen, Qian Wang

TL;DR

Megatron introduces a stealthy clean-label backdoor attack for Vision Transformers by optimizing a transformer-aware trigger with two losses: $\mathcal{L}_{\alpha}$ (latent-layer attention alignment) and $\mathcal{L}_{\beta}$ (attention diffusion). It employs trigger masking to create multiple sub-triggers and uses a clean-label poisoning objective to maintain perceptual similarity to target samples. Across CIFAR-10, GTSRB, CIFAR-100, and Tiny ImageNet, Megatron achieves high attack success rates (often $\geq95\%$) with superior image quality metrics and demonstrates robustness to state-of-the-art defenses and various model structures. A theoretical analysis links $\mathcal{L}_{\beta}$ to diffusion-area effects on token importance, and a user study confirms strong invisibility. The work highlights practical risks of ViT backdoors and motivates development of defenses beyond conventional patch- and attention-map-based approaches.

Abstract

Vision transformers have achieved impressive performance in various vision-related tasks, but their vulnerability to backdoor attacks is under-explored. A handful of existing works focus on dirty-label attacks with wrongly-labeled poisoned training samples, which may fail if a benign model trainer corrects the labels. In this paper, we propose Megatron, an evasive clean-label backdoor attack against vision transformers, where the attacker injects the backdoor without manipulating the data-labeling process. To generate an effective trigger, we customize two loss terms based on the attention mechanism used in transformer networks, i.e., latent loss and attention diffusion loss. The latent loss aligns the last attention layer between triggered samples and clean samples of the target label. The attention diffusion loss emphasizes the attention diffusion area that encompasses the trigger. A theoretical analysis is provided to underpin the rationale behind the attention diffusion loss. Extensive experiments on CIFAR-10, GTSRB, CIFAR-100, and Tiny ImageNet demonstrate the effectiveness of Megatron. Megatron can achieve attack success rates of over 90% even when the position of the trigger is slightly shifted during testing. Furthermore, Megatron achieves better evasiveness than baselines regarding both human visual inspection and defense strategies (i.e., DBAVT, BAVT, Beatrix, TeCo, and SAGE).

Megatron: Evasive Clean-Label Backdoor Attacks against Vision Transformer

TL;DR

Megatron introduces a stealthy clean-label backdoor attack for Vision Transformers by optimizing a transformer-aware trigger with two losses: (latent-layer attention alignment) and (attention diffusion). It employs trigger masking to create multiple sub-triggers and uses a clean-label poisoning objective to maintain perceptual similarity to target samples. Across CIFAR-10, GTSRB, CIFAR-100, and Tiny ImageNet, Megatron achieves high attack success rates (often ) with superior image quality metrics and demonstrates robustness to state-of-the-art defenses and various model structures. A theoretical analysis links to diffusion-area effects on token importance, and a user study confirms strong invisibility. The work highlights practical risks of ViT backdoors and motivates development of defenses beyond conventional patch- and attention-map-based approaches.

Abstract

Vision transformers have achieved impressive performance in various vision-related tasks, but their vulnerability to backdoor attacks is under-explored. A handful of existing works focus on dirty-label attacks with wrongly-labeled poisoned training samples, which may fail if a benign model trainer corrects the labels. In this paper, we propose Megatron, an evasive clean-label backdoor attack against vision transformers, where the attacker injects the backdoor without manipulating the data-labeling process. To generate an effective trigger, we customize two loss terms based on the attention mechanism used in transformer networks, i.e., latent loss and attention diffusion loss. The latent loss aligns the last attention layer between triggered samples and clean samples of the target label. The attention diffusion loss emphasizes the attention diffusion area that encompasses the trigger. A theoretical analysis is provided to underpin the rationale behind the attention diffusion loss. Extensive experiments on CIFAR-10, GTSRB, CIFAR-100, and Tiny ImageNet demonstrate the effectiveness of Megatron. Megatron can achieve attack success rates of over 90% even when the position of the trigger is slightly shifted during testing. Furthermore, Megatron achieves better evasiveness than baselines regarding both human visual inspection and defense strategies (i.e., DBAVT, BAVT, Beatrix, TeCo, and SAGE).

Paper Structure

This paper contains 27 sections, 20 equations, 4 figures, 18 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of Megatron. Megatron features three key components, i.e., trigger generation, trigger masking, and sample poisoning.
  • Figure 2: Compare backdoored samples generated by baseline attacks and Megatron. DBAVT-Bad is DBAVT-BadNet, and DBAVT-Wa is DBAVT-WaNet.
  • Figure 3: Impact of transparency value of $\phi_A$ and $\phi_D$.
  • Figure 4: User study results.