Table of Contents
Fetching ...

Stochastic Activation Pruning for Robust Adversarial Defense

Guneet S. Dhillon, Kamyar Azizzadenesheli, Zachary C. Lipton, Jeremy Bernstein, Jean Kossaifi, Aran Khanna, Anima Anandkumar

TL;DR

This work introduces Stochastic Activation Pruning (SAP), a post-hoc defense that stochastically prunes activations in pretrained networks by sampling according to activation magnitude and reweighting survivors to preserve dynamic range. Framed as a minimax game with mixed strategies, SAP offers robustness against adversarial perturbations without fine-tuning. Empirical results on CIFAR-10 and Atari show SAP improves accuracy and calibration under small perturbations, and combines additively with adversarial training to further boost robustness. The approach outperforms other stochastic baselines and remains feasible to deploy, albeit with higher computational cost for attacker gradient estimation in some settings.

Abstract

Neural networks are known to be vulnerable to adversarial examples. Carefully chosen perturbations to real images, while imperceptible to humans, induce misclassification and threaten the reliability of deep learning systems in the wild. To guard against adversarial examples, we take inspiration from game theory and cast the problem as a minimax zero-sum game between the adversary and the model. In general, for such games, the optimal strategy for both players requires a stochastic policy, also known as a mixed strategy. In this light, we propose Stochastic Activation Pruning (SAP), a mixed strategy for adversarial defense. SAP prunes a random subset of activations (preferentially pruning those with smaller magnitude) and scales up the survivors to compensate. We can apply SAP to pretrained networks, including adversarially trained models, without fine-tuning, providing robustness against adversarial examples. Experiments demonstrate that SAP confers robustness against attacks, increasing accuracy and preserving calibration.

Stochastic Activation Pruning for Robust Adversarial Defense

TL;DR

This work introduces Stochastic Activation Pruning (SAP), a post-hoc defense that stochastically prunes activations in pretrained networks by sampling according to activation magnitude and reweighting survivors to preserve dynamic range. Framed as a minimax game with mixed strategies, SAP offers robustness against adversarial perturbations without fine-tuning. Empirical results on CIFAR-10 and Atari show SAP improves accuracy and calibration under small perturbations, and combines additively with adversarial training to further boost robustness. The approach outperforms other stochastic baselines and remains feasible to deploy, albeit with higher computational cost for attacker gradient estimation in some settings.

Abstract

Neural networks are known to be vulnerable to adversarial examples. Carefully chosen perturbations to real images, while imperceptible to humans, induce misclassification and threaten the reliability of deep learning systems in the wild. To guard against adversarial examples, we take inspiration from game theory and cast the problem as a minimax zero-sum game between the adversary and the model. In general, for such games, the optimal strategy for both players requires a stochastic policy, also known as a mixed strategy. In this light, we propose Stochastic Activation Pruning (SAP), a mixed strategy for adversarial defense. SAP prunes a random subset of activations (preferentially pruning those with smaller magnitude) and scales up the survivors to compensate. We can apply SAP to pretrained networks, including adversarially trained models, without fine-tuning, providing robustness against adversarial examples. Experiments demonstrate that SAP confers robustness against attacks, increasing accuracy and preserving calibration.

Paper Structure

This paper contains 24 sections, 16 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Accuracy plots of a variety of attacks against dense model and SAP models with different perturbation strengths, $\lambda$. For the SAP$\tau\%$ models, $\tau$ denotes the percentages of samples drawn from the multinomial distribution, at each layer. $(a)$SAP models tested against random perturbation. $(b)$SAP models tested against the FGSM attack, using MC sampling. $(c)$SAP-$100$ tested against an iterative adversarial attack, using MC sampling (legend shows defender vs. adversary). It is worth restating that obtaining the iterative attack of SAP models is much more expensive and noisier than the iterative attack of dense models.
  • Figure 2: Robustness of dropout models, with different rates of dropout (denoted in the legends), against adversarial attacks, using MC sampling, with a variety of perturbation strengths, $\lambda$: $(a)$ dropout is applied on the pre-trained models during the validations; $(b)$ the models are trained using dropout, and dropout is applied during the validations; $(c)$ the models are trained using dropout, but dropout is not applied during the validations.
  • Figure 3: Accuracy plots of the dense, SAP-$100$, ADV and ADV$+$SAP-$100$ models, against adversarial attacks, using MC sampling, with a variety of perturbation strengths, $\lambda$.
  • Figure 4: Calibration plots of the dense, SAP-$100$, ADV and ADV$+$SAP-$100$ models, against adversarial attacks, using MC sampling, with a variety of different perturbation strengths, $\lambda$. These plots show the relation between the confidence level of the model's output and its accuracy.
  • Figure 5:
  • ...and 1 more figures