Table of Contents
Fetching ...

Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy Mann, Pushmeet Kohli

TL;DR

The study undertakes a comprehensive, systematic examination of adversarial training limits by varying losses, model size, activation functions, and unlabeled data usage. It demonstrates that a careful combination of TRADES, model weight averaging, larger Wide-ResNets, Swish/SiLU activations, and pseudo-labeled unlabeled data yields substantial robustness gains, establishing new state-of-the-art baselines on CIFAR-10/100 under norm-bounded perturbations. Notably, the best setups achieve 65.88% robust accuracy on CIFAR-10 against ℓ∞ = 8/255 with unlabeled data, and 80.53% robustness under ℓ2 = 128/255 without extra data, with strong generalization across datasets. The work provides practical guidance and baselines to guide future robustness research and highlights that robustness gains can arise from the synergistic combination of multiple, modest improvements rather than a single breakthrough.

Abstract

Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against $\ell_2$ perturbations of size $128/255$ on CIFAR-10, and of 36.88% (+8.46%) against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.

Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

TL;DR

The study undertakes a comprehensive, systematic examination of adversarial training limits by varying losses, model size, activation functions, and unlabeled data usage. It demonstrates that a careful combination of TRADES, model weight averaging, larger Wide-ResNets, Swish/SiLU activations, and pseudo-labeled unlabeled data yields substantial robustness gains, establishing new state-of-the-art baselines on CIFAR-10/100 under norm-bounded perturbations. Notably, the best setups achieve 65.88% robust accuracy on CIFAR-10 against ℓ∞ = 8/255 with unlabeled data, and 80.53% robustness under ℓ2 = 128/255 without extra data, with strong generalization across datasets. The work provides practical guidance and baselines to guide future robustness research and highlights that robustness gains can arise from the synergistic combination of multiple, modest improvements rather than a single breakthrough.

Abstract

Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against and norm-bounded perturbations of size and , respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against perturbations of size on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against perturbations of size on CIFAR-10, and of 36.88% (+8.46%) against perturbations of size on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.

Paper Structure

This paper contains 88 sections, 8 equations, 11 figures, 18 tables.

Figures (11)

  • Figure 1: Accuracy of various models ordered by publication date against AutoAttackcroce_reliable_2020 on Cifar-10 with $\ell_\infty$ perturbations of size $8/255$. Our newest models (on the far right) improve robust accuracy by +3.46% without additional data and by +6.35% when using additional unlabeled data.
  • Figure 2: Accuracy under $\ell_\infty$ attacks of size $\epsilon = 8/255$ on Cifar-10 as we vary the ratio of labeled-to-unlabeled data.
  • Figure 3: Clean accuracy and accuracy under $\ell_\infty$ attacks of size $\epsilon = 8/255$ on Cifar-10 as the network architecture changes. Panel \ref{['fig:width_depth_summary']} restricts the available data to Cifar-10, while panel \ref{['fig:width_depth_summary_unlabeled']} uses 500K additional unlabeled images extracted from 80M-Ti.
  • Figure 4: Accuracy under $\ell_\infty$ attacks of size $\epsilon = 8/255$ on Cifar-10 when using model weight averaging (WA). Panel \ref{['fig:ema_summary']} shows the final robust accuracy obtained for different values of the decay rate $\tau$ for the settings without (blue and left y-axis) and with additional unlabeled data (orange and right y-axis). Panel \ref{['fig:ema_comparison_main']} shows the evolution of the robust accuracy as training progresses.
  • Figure 5: Clean accuracy and accuracy under $\ell_\infty$ attacks of size $\epsilon = 8/255$ on Cifar-10 for different activation functions. Panel \ref{['fig:activation_summary']} restricts the available data to Cifar-10, while panel \ref{['fig:activation_summary_unlabled']} uses 500K additional unlabeled images extracted from 80M-Ti.
  • ...and 6 more figures