Table of Contents
Fetching ...

Adversarially Robust Generalization Requires More Data

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Mądry

TL;DR

This work treats adversarial robustness as a generalization problem and proves information-theoretic gaps between standard and robust learning in two simple data models, a Gaussian mixture and a Bernoulli model. It derives upper and lower bounds showing that robust generalization can require dramatically more data—scaling roughly as $n \ge C \varepsilon^2 \sqrt{d}/\log d$ in the Gaussian setting—and demonstrates a related, model-dependent gap in the Bernoulli case. A key theoretical insight is that non-linear strategies (e.g., thresholding) can dramatically reduce robust-sample requirements in some distributions, a claim supported by MNIST experiments where explicit thresholding yields improved robustness with less data. Complementary experiments on MNIST, CIFAR-10, and SVHN reveal the practical relevance: robust accuracy lags standard accuracy across data regimes, and the data demands for robustness help explain why current methods struggle on complex datasets like CIFAR-10.

Abstract

Machine learning models are often susceptible to adversarial perturbations of their inputs. Even small perturbations can cause state-of-the-art classifiers with high "standard" accuracy to produce an incorrect prediction with high confidence. To better understand this phenomenon, we study adversarially robust learning from the viewpoint of generalization. We show that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of "standard" learning. This gap is information theoretic and holds irrespective of the training algorithm or the model family. We complement our theoretical results with experiments on popular image classification datasets and show that a similar gap exists here as well. We postulate that the difficulty of training robust classifiers stems, at least partially, from this inherently larger sample complexity.

Adversarially Robust Generalization Requires More Data

TL;DR

This work treats adversarial robustness as a generalization problem and proves information-theoretic gaps between standard and robust learning in two simple data models, a Gaussian mixture and a Bernoulli model. It derives upper and lower bounds showing that robust generalization can require dramatically more data—scaling roughly as in the Gaussian setting—and demonstrates a related, model-dependent gap in the Bernoulli case. A key theoretical insight is that non-linear strategies (e.g., thresholding) can dramatically reduce robust-sample requirements in some distributions, a claim supported by MNIST experiments where explicit thresholding yields improved robustness with less data. Complementary experiments on MNIST, CIFAR-10, and SVHN reveal the practical relevance: robust accuracy lags standard accuracy across data regimes, and the data demands for robustness help explain why current methods struggle on complex datasets like CIFAR-10.

Abstract

Machine learning models are often susceptible to adversarial perturbations of their inputs. Even small perturbations can cause state-of-the-art classifiers with high "standard" accuracy to produce an incorrect prediction with high confidence. To better understand this phenomenon, we study adversarially robust learning from the viewpoint of generalization. We show that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of "standard" learning. This gap is information theoretic and holds irrespective of the training algorithm or the model family. We complement our theoretical results with experiments on popular image classification datasets and show that a similar gap exists here as well. We postulate that the difficulty of training robust classifiers stems, at least partially, from this inherently larger sample complexity.

Paper Structure

This paper contains 30 sections, 27 theorems, 118 equations, 5 figures.

Key Result

theorem 4

Let $(x, y)$ be drawn from a $(\theta^\star, \sigma)$-Gaussian model with $\lVert\theta^\star\rVert_2 = \sqrt{d}$ and $\sigma \, \leq \, c \cdot d^{1/4}$ where $c$ is a universal constant. Let $\widehat{w} \in\mathbb{R}^d$ be the vector $\widehat{w} = y \cdot x$. Then with high probability, the line

Figures (5)

  • Figure 1: Classification accuracies for robust optimization on MNIST and CIFAR10. In both cases, we trained standard convolutional networks to be robust to $\ell_\infty$-perturbations of the input. On MNIST, the robust test error closely tracks the corresponding training error and the model achieves high robust accuracy. On CIFAR10, the model still achieves a good natural (non-adversarial) test error, but there is a significant generalization gap for the robust accuracy. This phenomenon motivates our study of adversarially robust generalization.
  • Figure 2: Adversarially robust generalization performance as a function of training data size for $\ell_\infty$ adversaries on the MNIST, CIFAR-10 and SVHN datasets. For each choice of training set size and $\varepsilon_{test}$, we plot the best performance achieved over $\varepsilon_{train}$ and network capacity. This clearly shows that achieving a certain level of adversarially robust generalization requires significantly more samples than achieving the same level of standard generalization.
  • Figure 3: Adversarial robustness to $\ell_\infty$ attacks on the MNIST dataset for a simple convolution network madry2017towards with and without explicit thresholding filters. For each training set size choice and $\varepsilon_{test}$, we report the best test set accuracy achieved over choice of thresholding filters and $\varepsilon_{train}$. We observe that introducing thresholding filters significantly reduces the number of samples needed to achieve good adversarial generalization.
  • Figure 4: Complete experiments for adversarially robust generalization for $\ell_\infty$ adversaries. For each dataset and training $\varepsilon$ we report the performance of the corresponding classifier for each testing $\varepsilon$. We observe that the best performance on natural examples is achieved through natural training and the best adversarial performance is achieved when training with the largest $\varepsilon_{train}$ considered.
  • Figure 5: Complete experiments for adversarially robust generalization for $\ell_\infty$ adversaries for standard networks (top row) and networks with thresholding (bottom row) for the MNIST dataset. Thresholding corresponds to replacing the first convolutional layer with two channels $\textrm{ReLU}(x - \varepsilon)$ computing $\textrm{ReLU}(x - (1-\varepsilon))$. For each training $\varepsilon_{train}$ we report the performance of the corresponding classifier for each testing $\varepsilon_{test}$. For natural training, we use thresholding filters identical to those used for $\varepsilon_{train}=0.1$. We observe that in each case, explicitly encoding thresholding filters in the network architecture boosts the adversarial robustness for a given training $\varepsilon_{train}$ and training set size.

Theorems & Definitions (52)

  • definition 1: Gaussian model
  • definition 2: Classification error
  • definition 3: Robust classification error
  • theorem 4
  • theorem 5
  • theorem 6
  • definition 7: Bernoulli model
  • theorem 8
  • theorem 9
  • theorem 10
  • ...and 42 more