Table of Contents
Fetching ...

The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data

Zixuan Zhu, Rui Wang, Cong Zou, Lihua Jing

TL;DR

This work tackles backdoor attacks in DNN training by exploiting prediction entropy to separate poisoned from benign samples and by using a dual-network framework, The Victim and The Beneficiary (V&B), to train a clean model directly on poisoned data without benign examples. The Victim acts as a powerful detector trained on suspicious data, while the Beneficiary learns from credible data with AttentionMix augmentation, further enhanced by semi-supervised suppression that uses the Victim's knowledge to erase backdoors. A strong data augmentation, AttentionMix, disrupts trigger patterns by region-aware mixing, and a Gaussian Mixture Model-based warming-up automates sample filtering. Across CIFAR-10 and an ImageNet subset, V&B achieves near-zero attack success rates with minimal benigh accuracy loss, outperforming state-of-the-art defenses and demonstrating practical viability when benign data are unavailable.

Abstract

Recently, backdoor attacks have posed a serious security threat to the training process of deep neural networks (DNNs). The attacked model behaves normally on benign samples but outputs a specific result when the trigger is present. However, compared with the rocketing progress of backdoor attacks, existing defenses are difficult to deal with these threats effectively or require benign samples to work, which may be unavailable in real scenarios. In this paper, we find that the poisoned samples and benign samples can be distinguished with prediction entropy. This inspires us to propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples. Firstly, we sacrifice the Victim network to be a powerful poisoned sample detector by training on suspicious samples. Secondly, we train the Beneficiary network on the credible samples selected by the Victim to inhibit backdoor injection. Thirdly, a semi-supervised suppression strategy is adopted for erasing potential backdoors and improving model performance. Furthermore, to better inhibit missed poisoned samples, we propose a strong data augmentation method, AttentionMix, which works well with our proposed V&B framework. Extensive experiments on two widely used datasets against 6 state-of-the-art attacks demonstrate that our framework is effective in preventing backdoor injection and robust to various attacks while maintaining the performance on benign samples. Our code is available at https://github.com/Zixuan-Zhu/VaB.

The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data

TL;DR

This work tackles backdoor attacks in DNN training by exploiting prediction entropy to separate poisoned from benign samples and by using a dual-network framework, The Victim and The Beneficiary (V&B), to train a clean model directly on poisoned data without benign examples. The Victim acts as a powerful detector trained on suspicious data, while the Beneficiary learns from credible data with AttentionMix augmentation, further enhanced by semi-supervised suppression that uses the Victim's knowledge to erase backdoors. A strong data augmentation, AttentionMix, disrupts trigger patterns by region-aware mixing, and a Gaussian Mixture Model-based warming-up automates sample filtering. Across CIFAR-10 and an ImageNet subset, V&B achieves near-zero attack success rates with minimal benigh accuracy loss, outperforming state-of-the-art defenses and demonstrating practical viability when benign data are unavailable.

Abstract

Recently, backdoor attacks have posed a serious security threat to the training process of deep neural networks (DNNs). The attacked model behaves normally on benign samples but outputs a specific result when the trigger is present. However, compared with the rocketing progress of backdoor attacks, existing defenses are difficult to deal with these threats effectively or require benign samples to work, which may be unavailable in real scenarios. In this paper, we find that the poisoned samples and benign samples can be distinguished with prediction entropy. This inspires us to propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples. Firstly, we sacrifice the Victim network to be a powerful poisoned sample detector by training on suspicious samples. Secondly, we train the Beneficiary network on the credible samples selected by the Victim to inhibit backdoor injection. Thirdly, a semi-supervised suppression strategy is adopted for erasing potential backdoors and improving model performance. Furthermore, to better inhibit missed poisoned samples, we propose a strong data augmentation method, AttentionMix, which works well with our proposed V&B framework. Extensive experiments on two widely used datasets against 6 state-of-the-art attacks demonstrate that our framework is effective in preventing backdoor injection and robust to various attacks while maintaining the performance on benign samples. Our code is available at https://github.com/Zixuan-Zhu/VaB.
Paper Structure (23 sections, 11 equations, 6 figures, 7 tables)

This paper contains 23 sections, 11 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: The average prediction entropy of benign samples versus poisoned samples crafted by 6 backdoor attacks. We conduct the experiment on CIFAR-10 with ResNet-18 under poisoning rate 10%, where our special training strategy (described in section \ref{['section: warmup']}) was adopted.
  • Figure 2: (a) The pipeline of our training framework. Net V is the Victim network (i.e. poisoned network) that only learns from the filtered suspicious set, and Net B is the Beneficiary network (i.e. clean network) we wanted. Red triangles represent poisoned samples, and green circles are benign samples. The color of nets indicates their degree of poisoning: the darker the color, the deeper the poisoning, and vice versa. Deep blue areas are three training stages, and light blue areas are basic blocks that will be executed many times in the corresponding stage. (b) An example of AttentionMix. The bottom four images are the non-activation region of the cat image, the activation region of the cat image, the same activation region in the dog image, and the final generated image, respectively. And their labels are located below the images. (c) Suppression for poisoned samples and benign samples. The target label is assumed as the first label. Suppression can reduce the target label component of poisoned samples but have little influence on the ground-truth label of benign samples.
  • Figure 3: Ablation study of mixing threshold $t_m$ on CIFAR-10 (left) and ImageNet subset (right).
  • Figure 4: The distributions of prediction entropy on CIFAR-10 (left) and the ImageNet subset (right) under BadNets attack. The blue histogram is the probability density histogram of all samples' prediction entropy. The red and green dotted lines are two Gaussian distributions fitted by GMM, donating the distribution of suspicious samples and the distribution of credible samples, respectively.
  • Figure 5: Poisoned samples crafted by different backdoor attacks for CIFAR-10 and ImageNet subset, including BadNets badnets, Blend blend, WaNet wanet, Dynamic Dynamic, CL CL and SIG SIG. The first sample in the two rows is benign.
  • ...and 1 more figures