Table of Contents
Fetching ...

Beating Backdoor Attack at Its Own Game

Min Liu, Alberto Sangiovanni-Vincentelli, Xiangyu Yue

TL;DR

This work proposes a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples and achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data.

Abstract

Deep neural networks (DNNs) are vulnerable to backdoor attack, which does not affect the network's performance on clean data but would manipulate the network behavior once a trigger pattern is added. Existing defense methods have greatly reduced attack success rate, but their prediction accuracy on clean data still lags behind a clean model by a large margin. Inspired by the stealthiness and effectiveness of backdoor attack, we propose a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples. Following the general steps in backdoor attack, we detect a small set of suspected samples and then apply a poisoning strategy to them. The non-adversarial backdoor, once triggered, suppresses the attacker's backdoor on poisoned data, but has limited influence on clean data. The defense can be carried out during data preprocessing, without any modification to the standard end-to-end training pipeline. We conduct extensive experiments on multiple benchmarks with different architectures and representative attacks. Results demonstrate that our method achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data. Considering the surprising defense ability displayed by our framework, we call for more attention to utilizing backdoor for backdoor defense. Code is available at https://github.com/minliu01/non-adversarial_backdoor.

Beating Backdoor Attack at Its Own Game

TL;DR

This work proposes a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples and achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data.

Abstract

Deep neural networks (DNNs) are vulnerable to backdoor attack, which does not affect the network's performance on clean data but would manipulate the network behavior once a trigger pattern is added. Existing defense methods have greatly reduced attack success rate, but their prediction accuracy on clean data still lags behind a clean model by a large margin. Inspired by the stealthiness and effectiveness of backdoor attack, we propose a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples. Following the general steps in backdoor attack, we detect a small set of suspected samples and then apply a poisoning strategy to them. The non-adversarial backdoor, once triggered, suppresses the attacker's backdoor on poisoned data, but has limited influence on clean data. The defense can be carried out during data preprocessing, without any modification to the standard end-to-end training pipeline. We conduct extensive experiments on multiple benchmarks with different architectures and representative attacks. Results demonstrate that our method achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data. Considering the surprising defense ability displayed by our framework, we call for more attention to utilizing backdoor for backdoor defense. Code is available at https://github.com/minliu01/non-adversarial_backdoor.
Paper Structure (35 sections, 2 equations, 5 figures, 6 tables)

This paper contains 35 sections, 2 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Representations under the effect of adversarial backdoor (AB) and non-adversarial backdoor (NAB), which are injected by attackers and defenders respectively. "Stamp" is the trigger pattern for NAB. (a) Clean samples are not influenced by backdoor. (b) AB changes model behavior on poisoned samples. (c) NAB is not triggered on clean samples. (d) NAB suppresses the effectiveness of AB on poisoned samples.
  • Figure 2: Overview of the proposed framework. The attacker injects an adversarial backdoor by selecting and poisoning a set of clean samples. After obtaining the dataset, the defender detect and poison a set of suspected samples to inject the non-adversarial backdoor. Both attack and defense take place in the standard end-to-end training pipeline. In the testing stage, we stamp each input to keep the non-adversarial backdoor triggered. We can also adopt a test data filtering technique by comparing the predictions with or without the stamp. Samples with inconsistent predictions are identified as poisoned.
  • Figure 3: Detection and pseudo label accuracy on CIFAR-10. The maximal detection accuracy for CL attack is $\min(\frac{\lambda}{\mu}, 1)=0.5$. Pseudo label accuracy is calculated on LGA detected samples.
  • Figure 4: Clean accuracy, backdoor accuracy, attack success rate and defense success rate (%) under different detection accuracy and pseudo label accuracy. The experiments are conducted on CIFAR-10 under BadNets, WaNet and CL. To generate pseudo labels of accuracy $p$, we randomly change $1-p$ of the true labels to a different class. For a detection accuracy $q$, we randomly select $qN$ poisoned samples and $(1-q)N$ clean samples, where $N$ is size of the training set.
  • Figure 5: Examples of (a) raw images and saliency maps of their (b) clean, (c) stamped clean, (d) poisoned, (e) stamped and poisoned versions, which are obtained using NAB under BadNets attack.