Table of Contents
Fetching ...

Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples

Andrew C. Cullen, Shijie Liu, Paul Montague, Sarah M. Erfani, Benjamin I. P. Rubinstein

TL;DR

The paper demonstrates a counterintuitive risk: robustness certificates can be exploited to craft norm-minimising adversarial examples more efficiently, challenging the notion that published certifications universally boost security. It introduces Certification Aware Attack (CAA), a two-stage framework that uses certification radii to navigate the input space more effectively and then refines adversarial perturbations while preserving the attack's target label. Empirical results across MNIST, CIFAR-10, and ImageNet show that CAA yields smaller perturbations and faster identification of adversarial examples than established attacks on models protected by randomized smoothing and IBP-based certors, with up to substantial reductions in the median attack size relative to certified bounds. The work emphasizes that releasing certifications can inadvertently increase attack surface, and discusses mitigation strategies such as withholding certification details and relying on class-level disclosures to mitigate risk, while also offering a framework to better assess the tightness of certification bounds in practice.

Abstract

In guaranteeing the absence of adversarial examples in an instance's neighbourhood, certification mechanisms play an important role in demonstrating neural net robustness. In this paper, we ask if these certifications can compromise the very models they help to protect? Our new \emph{Certification Aware Attack} exploits certifications to produce computationally efficient norm-minimising adversarial examples $74 \%$ more often than comparable attacks, while reducing the median perturbation norm by more than $10\%$. While these attacks can be used to assess the tightness of certification bounds, they also highlight that releasing certifications can paradoxically reduce security.

Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples

TL;DR

The paper demonstrates a counterintuitive risk: robustness certificates can be exploited to craft norm-minimising adversarial examples more efficiently, challenging the notion that published certifications universally boost security. It introduces Certification Aware Attack (CAA), a two-stage framework that uses certification radii to navigate the input space more effectively and then refines adversarial perturbations while preserving the attack's target label. Empirical results across MNIST, CIFAR-10, and ImageNet show that CAA yields smaller perturbations and faster identification of adversarial examples than established attacks on models protected by randomized smoothing and IBP-based certors, with up to substantial reductions in the median attack size relative to certified bounds. The work emphasizes that releasing certifications can inadvertently increase attack surface, and discusses mitigation strategies such as withholding certification details and relying on class-level disclosures to mitigate risk, while also offering a framework to better assess the tightness of certification bounds in practice.

Abstract

In guaranteeing the absence of adversarial examples in an instance's neighbourhood, certification mechanisms play an important role in demonstrating neural net robustness. In this paper, we ask if these certifications can compromise the very models they help to protect? Our new \emph{Certification Aware Attack} exploits certifications to produce computationally efficient norm-minimising adversarial examples more often than comparable attacks, while reducing the median perturbation norm by more than . While these attacks can be used to assess the tightness of certification bounds, they also highlight that releasing certifications can paradoxically reduce security.
Paper Structure (25 sections, 17 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 25 sections, 17 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustrative example of an evasion attack for a binary classifier, that changes the output from blue to red. Our new attack framework exploits knowledge of the certifications (circles) to minimise the number of iterative steps required.
  • Figure 2: Minimum achievable average percentage difference between the attack radii and the certified guarantee of Cohen et al. (Equation \ref{['eqn:Cohen_Bound']}) for a given success rate for our technique (Blue), PGD (Red), and Carlini-Wagner (Green), when tested against MNIST, CIFAR-$10$ and Imagenet. Solid and dashed lines represents $\sigma=\{0.5, 1.0\}$ for parameter space of Table \ref{['tab:parameter_table']}.
  • Figure 3: Best achieved Attack Proportion for our new Certification Aware Attack (blue), PGD (red), DeepFool (cyan), Carlini-Wagner (green), and AutoAttack (magenta); where the rows correspond to $\sigma = \{0.5, 1.0\}$ and the columns correspond to MNIST, CIFAR-$10$ and Imagenet. Black dotted line represents the best case performance as per Equation \ref{['eqn:Cohen_Bound']}.
  • Figure 4: Success rates for our attack (blue) and PGD (red) for an IBP certified MNIST model.
  • Figure 5: Response of key metrics for our Certification Aware Attack to changes in $\epsilon_{\text{min}}$, $\epsilon_{\text{max}}$ and $\delta$.
  • ...and 3 more figures