Table of Contents
Fetching ...

Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I. P. Rubinstein

TL;DR

This work addresses poisoning attacks by delivering pointwise-certified robustness guarantees for individual test samples under bounded training-data changes up to radius $r$. It advances a general framework that combines Differential Privacy (ADP and Rényi-DP) with the Sampled Gaussian Mechanism and improved group privacy to certify a test instance against insertions, deletions, or modifications of training data. The approach supports both multinomial label predictions and score-based outputs, and demonstrates substantial gains in certified accuracy and maximum certification radius across MNIST, Fashion-MNIST, and CIFAR-10 relative to baselines. The combination of DP guarantees, sub-sampling, and bagging enables per-example robust guarantees with practical implications for deploying models in sensitive settings, albeit with notable computational costs due to training multiple model instances.

Abstract

Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus. While defences against specific poisoning attacks do exist, they in general do not provide any guarantees, leaving them potentially countered by novel attacks. In contrast, by examining worst-case behaviours Certified Defences make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples, known as pointwise certification. We achieve this by exploiting both Differential Privacy and the Sampled Gaussian Mechanism to ensure the invariance of prediction for each testing instance against finite numbers of poisoned examples. In doing so, our model provides guarantees of adversarial robustness that are more than twice as large as those provided by prior certifications.

Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

TL;DR

This work addresses poisoning attacks by delivering pointwise-certified robustness guarantees for individual test samples under bounded training-data changes up to radius . It advances a general framework that combines Differential Privacy (ADP and Rényi-DP) with the Sampled Gaussian Mechanism and improved group privacy to certify a test instance against insertions, deletions, or modifications of training data. The approach supports both multinomial label predictions and score-based outputs, and demonstrates substantial gains in certified accuracy and maximum certification radius across MNIST, Fashion-MNIST, and CIFAR-10 relative to baselines. The combination of DP guarantees, sub-sampling, and bagging enables per-example robust guarantees with practical implications for deploying models in sensitive settings, albeit with notable computational costs due to training multiple model instances.

Abstract

Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus. While defences against specific poisoning attacks do exist, they in general do not provide any guarantees, leaving them potentially countered by novel attacks. In contrast, by examining worst-case behaviours Certified Defences make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples, known as pointwise certification. We achieve this by exploiting both Differential Privacy and the Sampled Gaussian Mechanism to ensure the invariance of prediction for each testing instance against finite numbers of poisoned examples. In doing so, our model provides guarantees of adversarial robustness that are more than twice as large as those provided by prior certifications.
Paper Structure (26 sections, 5 theorems, 38 equations, 2 figures, 2 tables)

This paper contains 26 sections, 5 theorems, 38 equations, 2 figures, 2 tables.

Key Result

Theorem 8

If a randomised function $M$ obtained by SGM with sample ratio $q$ and noise level $\sigma$ achieves $(\alpha, \operatorname{SG}(\alpha, M, q, \sigma))$-Rényi-DP for all datasets $\mathcal{D}_1$ and $\mathcal{D}_2 \in \mathcal{B}(\mathcal{D}_1,1)$, then for all datasets $\mathcal{D}_3 \in \mathcal{B

Figures (2)

  • Figure 1: The left column contains certified accuracy plots for the method RDP-multinomial against different noise levels ($\sigma$); the right column contains certified accuracy plots for comparisons against variants and baselines. In the plots, the X-axis is radius $r$ (symmetric difference) while the Y-axis is the corresponding certified accuracy $CA_r$ at radius $r$.
  • Figure 2: The plots contain certified accuracy plot for the method RDP-multinomial with proposed improved group privacy (RDP-multinomial) against RDP-multinomial with standard group privacy (RDP-multinomial-GP) on datasets MNIST and Fashion-MNIST.

Theorems & Definitions (17)

  • Definition 1: Inference by multinomial label
  • Definition 2: Inference by probability scores
  • Definition 3: Pointwise-Certified Robustness
  • Definition 4: Approximate-DP
  • Definition 5: Rényi divergence
  • Definition 6: Rényi Differential Privacy
  • Definition 7: Outcomes guarantee
  • Theorem 8: Improved Rényi-DP group privacy under the SGM
  • proof
  • Lemma 9: Pointwise outcomes guarantee
  • ...and 7 more