The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data
Zixuan Zhu, Rui Wang, Cong Zou, Lihua Jing
TL;DR
This work tackles backdoor attacks in DNN training by exploiting prediction entropy to separate poisoned from benign samples and by using a dual-network framework, The Victim and The Beneficiary (V&B), to train a clean model directly on poisoned data without benign examples. The Victim acts as a powerful detector trained on suspicious data, while the Beneficiary learns from credible data with AttentionMix augmentation, further enhanced by semi-supervised suppression that uses the Victim's knowledge to erase backdoors. A strong data augmentation, AttentionMix, disrupts trigger patterns by region-aware mixing, and a Gaussian Mixture Model-based warming-up automates sample filtering. Across CIFAR-10 and an ImageNet subset, V&B achieves near-zero attack success rates with minimal benigh accuracy loss, outperforming state-of-the-art defenses and demonstrating practical viability when benign data are unavailable.
Abstract
Recently, backdoor attacks have posed a serious security threat to the training process of deep neural networks (DNNs). The attacked model behaves normally on benign samples but outputs a specific result when the trigger is present. However, compared with the rocketing progress of backdoor attacks, existing defenses are difficult to deal with these threats effectively or require benign samples to work, which may be unavailable in real scenarios. In this paper, we find that the poisoned samples and benign samples can be distinguished with prediction entropy. This inspires us to propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples. Firstly, we sacrifice the Victim network to be a powerful poisoned sample detector by training on suspicious samples. Secondly, we train the Beneficiary network on the credible samples selected by the Victim to inhibit backdoor injection. Thirdly, a semi-supervised suppression strategy is adopted for erasing potential backdoors and improving model performance. Furthermore, to better inhibit missed poisoned samples, we propose a strong data augmentation method, AttentionMix, which works well with our proposed V&B framework. Extensive experiments on two widely used datasets against 6 state-of-the-art attacks demonstrate that our framework is effective in preventing backdoor injection and robust to various attacks while maintaining the performance on benign samples. Our code is available at https://github.com/Zixuan-Zhu/VaB.
