Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

Shijie Liu; Andrew C. Cullen; Paul Montague; Sarah M. Erfani; Benjamin I. P. Rubinstein

Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I. P. Rubinstein

TL;DR

This work addresses poisoning attacks by delivering pointwise-certified robustness guarantees for individual test samples under bounded training-data changes up to radius $r$. It advances a general framework that combines Differential Privacy (ADP and Rényi-DP) with the Sampled Gaussian Mechanism and improved group privacy to certify a test instance against insertions, deletions, or modifications of training data. The approach supports both multinomial label predictions and score-based outputs, and demonstrates substantial gains in certified accuracy and maximum certification radius across MNIST, Fashion-MNIST, and CIFAR-10 relative to baselines. The combination of DP guarantees, sub-sampling, and bagging enables per-example robust guarantees with practical implications for deploying models in sensitive settings, albeit with notable computational costs due to training multiple model instances.

Abstract

Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus. While defences against specific poisoning attacks do exist, they in general do not provide any guarantees, leaving them potentially countered by novel attacks. In contrast, by examining worst-case behaviours Certified Defences make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples, known as pointwise certification. We achieve this by exploiting both Differential Privacy and the Sampled Gaussian Mechanism to ensure the invariance of prediction for each testing instance against finite numbers of poisoned examples. In doing so, our model provides guarantees of adversarial robustness that are more than twice as large as those provided by prior certifications.

Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

TL;DR

This work addresses poisoning attacks by delivering pointwise-certified robustness guarantees for individual test samples under bounded training-data changes up to radius

. It advances a general framework that combines Differential Privacy (ADP and Rényi-DP) with the Sampled Gaussian Mechanism and improved group privacy to certify a test instance against insertions, deletions, or modifications of training data. The approach supports both multinomial label predictions and score-based outputs, and demonstrates substantial gains in certified accuracy and maximum certification radius across MNIST, Fashion-MNIST, and CIFAR-10 relative to baselines. The combination of DP guarantees, sub-sampling, and bagging enables per-example robust guarantees with practical implications for deploying models in sensitive settings, albeit with notable computational costs due to training multiple model instances.

Abstract

Paper Structure (26 sections, 5 theorems, 38 equations, 2 figures, 2 tables)

This paper contains 26 sections, 5 theorems, 38 equations, 2 figures, 2 tables.

Introduction
Data Poisoning Attacks and Defences
Threat Model.
Certified Defences.
Outcomes Guarantee
Differential Privacy.
Sampled Gaussian Mechanism with Improved Group Privacy.
Outcomes-Guaranteed Certifications
Algorithmic Implementation
Training.
Certification.
Experiments
Limitations and Future Directions
Conclusion
Appendix
...and 11 more sections

Key Result

Theorem 8

If a randomised function $M$ obtained by SGM with sample ratio $q$ and noise level $\sigma$ achieves $(\alpha, \operatorname{SG}(\alpha, M, q, \sigma))$-Rényi-DP for all datasets $\mathcal{D}_1$ and $\mathcal{D}_2 \in \mathcal{B}(\mathcal{D}_1,1)$, then for all datasets $\mathcal{D}_3 \in \mathcal{B

Figures (2)

Figure 1: The left column contains certified accuracy plots for the method RDP-multinomial against different noise levels ($\sigma$); the right column contains certified accuracy plots for comparisons against variants and baselines. In the plots, the X-axis is radius $r$ (symmetric difference) while the Y-axis is the corresponding certified accuracy $CA_r$ at radius $r$.
Figure 2: The plots contain certified accuracy plot for the method RDP-multinomial with proposed improved group privacy (RDP-multinomial) against RDP-multinomial with standard group privacy (RDP-multinomial-GP) on datasets MNIST and Fashion-MNIST.

Theorems & Definitions (17)

Definition 1: Inference by multinomial label
Definition 2: Inference by probability scores
Definition 3: Pointwise-Certified Robustness
Definition 4: Approximate-DP
Definition 5: Rényi divergence
Definition 6: Rényi Differential Privacy
Definition 7: Outcomes guarantee
Theorem 8: Improved Rényi-DP group privacy under the SGM
proof
Lemma 9: Pointwise outcomes guarantee
...and 7 more

Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

TL;DR

Abstract

Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (17)