Table of Contents
Fetching ...

Enhancing All-to-X Backdoor Attacks with Optimized Target Class Mapping

Lei Wang, Yulong Tian, Hao Han, Fengyuan Xu

TL;DR

This work investigates All-to-X (A2X) backdoor attacks with multiple target classes and shows that existing defenses largely fail against A2X. It proposes a two-step attack design: similarity-based class grouping using a surrogate model to form $X$ source groups, and distance-aware target class assignment solved via Maximum Weight Bipartite Matching with the Hungarian algorithm to maximize group–target separation. The method yields substantial ASR improvements over baseline A2X attacks (up to ~28% on several datasets) while maintaining clean accuracy and demonstrating strong transferability across surrogate models and knowledge scenarios. The findings highlight a significant, under-explored security risk posed by A2X backdoors and provide a practical framework for robust mapping optimization, underscoring the need for defenses that account for multi-target backdoors.

Abstract

Backdoor attacks pose severe threats to machine learning systems, prompting extensive research in this area. However, most existing work focuses on single-target All-to-One (A2O) attacks, overlooking the more complex All-to-X (A2X) attacks with multiple target classes, which are often assumed to have low attack success rates. In this paper, we first demonstrate that A2X attacks are robust against state-of-the-art defenses. We then propose a novel attack strategy that enhances the success rate of A2X attacks while maintaining robustness by optimizing grouping and target class assignment mechanisms. Our method improves the attack success rate by up to 28%, with average improvements of 6.7%, 16.4%, 14.1% on CIFAR10, CIFAR100, and Tiny-ImageNet, respectively. We anticipate that this study will raise awareness of A2X attacks and stimulate further research in this under-explored area. Our code is available at https://github.com/kazefjj/A2X-backdoor .

Enhancing All-to-X Backdoor Attacks with Optimized Target Class Mapping

TL;DR

This work investigates All-to-X (A2X) backdoor attacks with multiple target classes and shows that existing defenses largely fail against A2X. It proposes a two-step attack design: similarity-based class grouping using a surrogate model to form source groups, and distance-aware target class assignment solved via Maximum Weight Bipartite Matching with the Hungarian algorithm to maximize group–target separation. The method yields substantial ASR improvements over baseline A2X attacks (up to ~28% on several datasets) while maintaining clean accuracy and demonstrating strong transferability across surrogate models and knowledge scenarios. The findings highlight a significant, under-explored security risk posed by A2X backdoors and provide a practical framework for robust mapping optimization, underscoring the need for defenses that account for multi-target backdoors.

Abstract

Backdoor attacks pose severe threats to machine learning systems, prompting extensive research in this area. However, most existing work focuses on single-target All-to-One (A2O) attacks, overlooking the more complex All-to-X (A2X) attacks with multiple target classes, which are often assumed to have low attack success rates. In this paper, we first demonstrate that A2X attacks are robust against state-of-the-art defenses. We then propose a novel attack strategy that enhances the success rate of A2X attacks while maintaining robustness by optimizing grouping and target class assignment mechanisms. Our method improves the attack success rate by up to 28%, with average improvements of 6.7%, 16.4%, 14.1% on CIFAR10, CIFAR100, and Tiny-ImageNet, respectively. We anticipate that this study will raise awareness of A2X attacks and stimulate further research in this under-explored area. Our code is available at https://github.com/kazefjj/A2X-backdoor .

Paper Structure

This paper contains 41 sections, 3 equations, 12 figures, 8 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of A2O and A2X attacks. In A2O attacks, all triggered samples from source classes are misclassified into a single target class (Class 1). In A2X attacks, source classes are clustered into $X$ groups ($X$=3 shown here), with each group assigned a distinct target class. Triggered samples from each group are then misclassified to their group's designated target class.
  • Figure 2: The Attack Success Rate of A2X Attacks under Different Poisoning Rates on CIFAR10 with ResNet18.
  • Figure 3: The t-SNE visualization of the CIFAR-10 dataset
  • Figure 4: Comparison of our method with existing approaches. We first cluster similar classes into the same class groups, resulting in simpler decision boundaries that are easier to learn (upper part). We then select more distant target class for each class group to reduce feature interference during model training (lower part).
  • Figure 5: Attack Success Rates across Different Poisoning Rates and Datasets. "Baseline" lines are the results of the baseline methods, and the "Ours" lines show the results of our proposed method. Lines represent average values from five repeated experiments, with shadow regions indicating standard deviations.
  • ...and 7 more figures