Adversarially Robust Generalization Requires More Data
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Mądry
TL;DR
This work treats adversarial robustness as a generalization problem and proves information-theoretic gaps between standard and robust learning in two simple data models, a Gaussian mixture and a Bernoulli model. It derives upper and lower bounds showing that robust generalization can require dramatically more data—scaling roughly as $n \ge C \varepsilon^2 \sqrt{d}/\log d$ in the Gaussian setting—and demonstrates a related, model-dependent gap in the Bernoulli case. A key theoretical insight is that non-linear strategies (e.g., thresholding) can dramatically reduce robust-sample requirements in some distributions, a claim supported by MNIST experiments where explicit thresholding yields improved robustness with less data. Complementary experiments on MNIST, CIFAR-10, and SVHN reveal the practical relevance: robust accuracy lags standard accuracy across data regimes, and the data demands for robustness help explain why current methods struggle on complex datasets like CIFAR-10.
Abstract
Machine learning models are often susceptible to adversarial perturbations of their inputs. Even small perturbations can cause state-of-the-art classifiers with high "standard" accuracy to produce an incorrect prediction with high confidence. To better understand this phenomenon, we study adversarially robust learning from the viewpoint of generalization. We show that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of "standard" learning. This gap is information theoretic and holds irrespective of the training algorithm or the model family. We complement our theoretical results with experiments on popular image classification datasets and show that a similar gap exists here as well. We postulate that the difficulty of training robust classifiers stems, at least partially, from this inherently larger sample complexity.
