Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks
Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I. P. Rubinstein
TL;DR
This work addresses poisoning attacks by delivering pointwise-certified robustness guarantees for individual test samples under bounded training-data changes up to radius $r$. It advances a general framework that combines Differential Privacy (ADP and Rényi-DP) with the Sampled Gaussian Mechanism and improved group privacy to certify a test instance against insertions, deletions, or modifications of training data. The approach supports both multinomial label predictions and score-based outputs, and demonstrates substantial gains in certified accuracy and maximum certification radius across MNIST, Fashion-MNIST, and CIFAR-10 relative to baselines. The combination of DP guarantees, sub-sampling, and bagging enables per-example robust guarantees with practical implications for deploying models in sensitive settings, albeit with notable computational costs due to training multiple model instances.
Abstract
Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus. While defences against specific poisoning attacks do exist, they in general do not provide any guarantees, leaving them potentially countered by novel attacks. In contrast, by examining worst-case behaviours Certified Defences make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples, known as pointwise certification. We achieve this by exploiting both Differential Privacy and the Sampled Gaussian Mechanism to ensure the invariance of prediction for each testing instance against finite numbers of poisoned examples. In doing so, our model provides guarantees of adversarial robustness that are more than twice as large as those provided by prior certifications.
