Protecting against simultaneous data poisoning attacks
Neel Alex, Shoaib Ahmed Siddiqui, Amartya Sanyal, David Krueger
TL;DR
This work addresses the realistic threat of multiple simultaneous data poisoning attacks, showing that several backdoors can be inserted into a single model with minimal impact on clean accuracy and that existing defenses perform poorly under this setting. It introduces BaDLoss, a loss-dynamics–based defense that uses a small set of bona fide clean probes to detect and remove anomalous training examples before retraining, achieving strong defense performance with limited degradation to clean accuracy. In extensive experiments on CIFAR-10 and GTSRB, BaDLoss substantially lowers the average attack success rate in the multi-attack setting (to 7.98% and 10.29%, respectively) compared with 64.48% and 84.28% for other defenses, while remaining effective in single-attack scenarios. The work highlights the importance of evaluating defenses under multi-attack scenarios and suggests that loss-trajectory analysis can provide a robust, adaptable defense framework for complex poisoning landscapes.
Abstract
Current backdoor defense methods are evaluated against a single attack at a time. This is unrealistic, as powerful machine learning systems are trained on large datasets scraped from the internet, which may be attacked multiple times by one or more attackers. We demonstrate that simultaneously executed data poisoning attacks can effectively install multiple backdoors in a single model without substantially degrading clean accuracy. Furthermore, we show that existing backdoor defense methods do not effectively prevent attacks in this setting. Finally, we leverage insights into the nature of backdoor attacks to develop a new defense, BaDLoss, that is effective in the multi-attack setting. With minimal clean accuracy degradation, BaDLoss attains an average attack success rate in the multi-attack setting of 7.98% in CIFAR-10 and 10.29% in GTSRB, compared to the average of other defenses at 64.48% and 84.28% respectively.
