Adversarial Robustness May Be at Odds With Simplicity

Preetum Nakkiran

Adversarial Robustness May Be at Odds With Simplicity

Preetum Nakkiran

TL;DR

The note challenges the view that robust (adversarial) classification is fundamentally at odds with simplicity by presenting explicit constructions where simple classifiers can be highly accurate on standard and noisy data but non-robust under adversarial perturbations. It demonstrates, in contrast, that robust classifiers may require substantially more complex models (even exponential-time in some cases) and that a trade-off between robustness and standard accuracy can arise within a restricted class of simple classifiers. The results support the idea that observed robustness-accuracy trade-offs in practice may stem from the limitations of the classifier class rather than intrinsic task difficulty. Overall, the paper provides theoretical examples where Hypothesis (C) explains both questions about robustness and the observed trade-offs.

Abstract

Current techniques in machine learning are so far are unable to learn classifiers that are robust to adversarial perturbations. However, they are able to learn non-robust classifiers with very high accuracy, even in the presence of random perturbations. Towards explaining this gap, we highlight the hypothesis that $\textit{robust classification may require more complex classifiers (i.e. more capacity) than standard classification.}$ In this note, we show that this hypothesis is indeed possible, by giving several theoretical examples of classification tasks and sets of "simple" classifiers for which: (1) There exists a simple classifier with high standard accuracy, and also high accuracy under random $\ell_\infty$ noise. (2) Any simple classifier is not robust: it must have high adversarial loss with $\ell_\infty$ perturbations. (3) Robust classification is possible, but only with more complex classifiers (exponentially more complex, in some examples). Moreover, $\textit{there is a quantitative trade-off between robustness and standard accuracy among simple classifiers.}$ This suggests an alternate explanation of this phenomenon, which appears in practice: the tradeoff may occur not because the classification task inherently requires such a tradeoff (as in [Tsipras-Santurkar-Engstrom-Turner-Madry `18]), but because the structure of our current classifiers imposes such a tradeoff.

Adversarial Robustness May Be at Odds With Simplicity

TL;DR

Abstract

Adversarial Robustness May Be at Odds With Simplicity

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (13)