Table of Contents
Fetching ...

M-to-N Backdoor Paradigm: A Multi-Trigger and Multi-Target Attack to Deep Learning Models

Linshan Hou, Zhongyun Hua, Yuhong Li, Yifeng Zheng, Leo Yu Zhang

TL;DR

This work introduces the M-to-N backdoor paradigm, enabling a single input to be misclassified into any of $N$ attacker-chosen target classes, with each target activated by any one of $M$ triggers. It achieves this by a poisoned image generation framework comprising a trigger embedding network $\mathcal{H}$, a recovery network $\mathcal{R}$, and a discriminator $\mathcal{D}$, trained to embed triggers invisibly while allowing trigger recovery. The approach uses clean images from target classes as triggers and poisons a small fraction $\rho$ of the training data, producing backdoored models that sustain high ASR across multiple targets and remain robust to common preprocessing and defenses. Empirical results on MNIST, CIFAR-10, GTSRB, and ImageNet-10 demonstrate near-100% ASR for multiple targets with minimal degradation to clean accuracy, along with strong invisibility and defense-resilience properties. The study highlights a new, more versatile class of backdoor threats and motivates the development of defenses capable of countering multi-target, multi-trigger paradigms.

Abstract

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where a backdoored model behaves normally with clean inputs but exhibits attacker-specified behaviors upon the inputs containing triggers. Most previous backdoor attacks mainly focus on either the all-to-one or all-to-all paradigm, allowing attackers to manipulate an input to attack a single target class. Besides, the two paradigms rely on a single trigger for backdoor activation, rendering attacks ineffective if the trigger is destroyed. In light of the above, we propose a new $M$-to-$N$ attack paradigm that allows an attacker to manipulate any input to attack $N$ target classes, and each backdoor of the $N$ target classes can be activated by any one of its $M$ triggers. Our attack selects $M$ clean images from each target class as triggers and leverages our proposed poisoned image generation framework to inject the triggers into clean images invisibly. By using triggers with the same distribution as clean training images, the targeted DNN models can generalize to the triggers during training, thereby enhancing the effectiveness of our attack on multiple target classes. Extensive experimental results demonstrate that our new backdoor attack is highly effective in attacking multiple target classes and robust against pre-processing operations and existing defenses.

M-to-N Backdoor Paradigm: A Multi-Trigger and Multi-Target Attack to Deep Learning Models

TL;DR

This work introduces the M-to-N backdoor paradigm, enabling a single input to be misclassified into any of attacker-chosen target classes, with each target activated by any one of triggers. It achieves this by a poisoned image generation framework comprising a trigger embedding network , a recovery network , and a discriminator , trained to embed triggers invisibly while allowing trigger recovery. The approach uses clean images from target classes as triggers and poisons a small fraction of the training data, producing backdoored models that sustain high ASR across multiple targets and remain robust to common preprocessing and defenses. Empirical results on MNIST, CIFAR-10, GTSRB, and ImageNet-10 demonstrate near-100% ASR for multiple targets with minimal degradation to clean accuracy, along with strong invisibility and defense-resilience properties. The study highlights a new, more versatile class of backdoor threats and motivates the development of defenses capable of countering multi-target, multi-trigger paradigms.

Abstract

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where a backdoored model behaves normally with clean inputs but exhibits attacker-specified behaviors upon the inputs containing triggers. Most previous backdoor attacks mainly focus on either the all-to-one or all-to-all paradigm, allowing attackers to manipulate an input to attack a single target class. Besides, the two paradigms rely on a single trigger for backdoor activation, rendering attacks ineffective if the trigger is destroyed. In light of the above, we propose a new -to- attack paradigm that allows an attacker to manipulate any input to attack target classes, and each backdoor of the target classes can be activated by any one of its triggers. Our attack selects clean images from each target class as triggers and leverages our proposed poisoned image generation framework to inject the triggers into clean images invisibly. By using triggers with the same distribution as clean training images, the targeted DNN models can generalize to the triggers during training, thereby enhancing the effectiveness of our attack on multiple target classes. Extensive experimental results demonstrate that our new backdoor attack is highly effective in attacking multiple target classes and robust against pre-processing operations and existing defenses.
Paper Structure (16 sections, 7 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 7 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: An example of our $M$-to-$N$ backdoor attack about traffic sign classification. An input sign "Stop" can be misclassified as the target classes "No Entry", "Speed Limit", "Keep Right", and "Straight Ahead" when poisoning the sign with a trigger that corresponds to the target class. Note that the backdoor of each target class can be activated by any one of its $M$ triggers.
  • Figure 2: The process of our backdoor injection.