Table of Contents
Fetching ...

Certified Robustness to Adversarial Examples with Differential Privacy

Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana

TL;DR

This work establishes a formal connection between differential privacy and robustness to norm-bounded adversarial perturbations, and leverages it to build PixelDP, a scalable certified defense. PixelDP introduces a calibrated DP noise layer to make the network's output distribution $(b5,b4)$-PixelDP, enabling exact robustness certificates for predictions against $p$-norm attacks via an expected-output stability bound. The authors demonstrate the approach on large-scale models and datasets, including Inception-v3 on ImageNet, using an autoencoder-based deployment to avoid retraining the full network, and report meaningful certified robustness alongside competitive accuracy under attack when compared to state-of-the-art defenses. The results highlight DP's post-processing flexibility and the practical potential of certified defenses for real-world, large-scale vision systems, while outlining trade-offs between noise level, certified accuracy, and computational overhead.

Abstract

Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to norm-bounded attacks, but they either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google's Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired formalism, that provides a rigorous, generic, and flexible foundation for defense.

Certified Robustness to Adversarial Examples with Differential Privacy

TL;DR

This work establishes a formal connection between differential privacy and robustness to norm-bounded adversarial perturbations, and leverages it to build PixelDP, a scalable certified defense. PixelDP introduces a calibrated DP noise layer to make the network's output distribution -PixelDP, enabling exact robustness certificates for predictions against -norm attacks via an expected-output stability bound. The authors demonstrate the approach on large-scale models and datasets, including Inception-v3 on ImageNet, using an autoencoder-based deployment to avoid retraining the full network, and report meaningful certified robustness alongside competitive accuracy under attack when compared to state-of-the-art defenses. The results highlight DP's post-processing flexibility and the practical potential of certified defenses for real-world, large-scale vision systems, while outlining trade-offs between noise level, certified accuracy, and computational overhead.

Abstract

Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to norm-bounded attacks, but they either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google's Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired formalism, that provides a rigorous, generic, and flexible foundation for defense.

Paper Structure

This paper contains 26 sections, 6 theorems, 15 equations, 9 figures, 3 tables.

Key Result

Lemma 1

(Expected Output Stability Bound) Suppose a randomized function $A$, with bounded output $A(x) \in [0,b], \ b \in \mathbb{R}^+$, satisfies $(\epsilon,\delta)$-DP. Then the expected value of its output meets the following property: The expectation is taken over the randomness in $A$.

Figures (9)

  • Figure 1: Architecture. (a) In blue, the original DNN. In red, the noise layer that provides the $(\epsilon, \delta)$-DP guarantees. The noise can be added to the inputs or any of the following layers, but the distribution is rescaled by the sensitivity $\Delta_{p,q}$ of the computation performed by each layer before the noise layer. The DNN is trained with the original loss and optimizer (e.g., Momentum stochastic gradient descent). Predictions repeatedly call the $(\epsilon, \delta)$-DP DNN to measure its empirical expectation over the scores. (b) After adding the bounds for the measurement error between the empirical and true expectation (green) and the stability bounds from Lemma \ref{['lemma:expectation-bound']} for a given attack size $L_{attack}$ (red), the prediction is certified robust to this attack size if the lower bound of the $\arg\max$ label does not overlap with the upper bound of any other labels.
  • Figure 2: Certified accuracy, varying the construction attack bound ($L$) and prediction robustness threshold ($T$), on ImageNet auto-encoder/Inception and CIFAR-10 ResNet, 2-norm bounds. Robust accuracy at high Robustness thresholds (high $T$) increases with high-noise networks (high $L$). Low noise networks are both more accurate and more certifiably robust for low $T$.
  • Figure 3: Accuracy under attack on ImageNet. For the ImageNet auto-encoder plus Inception-v3, $L \in \{0.1,0.3,1.0\}$$2$-norm attacks. The PixelDP auto-encoder increases the robustness of Inception against $2$-norm attacks.
  • Figure 4: Accuracy under $2$-norm attack for PixelDP vs. Madry and RobustOpt, CIFAR-10 and SVHN. For $2$-norm attacks, PixelDP is on par with Madry until $L_{attack} \geq 1.2$; RobustOpt support only small models, and has lower accuracy.
  • Figure 5: PixelDP certified predictions vs. Madry accuracy, under attack, CIFAR-10 ResNets, $2$-norm attack. PixelDP makes fewer but more correct predictions up to $L_{attack} = 1.0$.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Lemma 1
  • proof
  • Corollary 1
  • proof
  • Proposition 1
  • proof
  • Proposition 2
  • Proposition
  • proof
  • Lemma 2
  • ...and 1 more