Table of Contents
Fetching ...

Adversarial examples from computational constraints

Sébastien Bubeck, Eric Price, Ilya Razenshteyn

TL;DR

The work argues that adversarial vulnerability in high-dimensional classifiers may stem from computational constraints rather than fundamental information limits. It proves an information-theoretic robustness possibility with feasible data, and separately shows exponential SQ-model hardness for robust learning on a carefully constructed high-dimensional task, yielding an exponential separation between robust and non-robust learning. A second result demonstrates that robust learning can be sample-efficient under mild generative-model assumptions, even as a simple non-robust classifier remains easy to learn. Together, these findings suggest adversarial examples could be an unavoidable byproduct of learning under computational constraints, guiding future research on robustness and hardness in realistic settings.

Abstract

Why are classifiers in high dimension vulnerable to "adversarial" perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints. First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give a particular classification task where learning a robust classifier is computationally intractable. More precisely we construct a binary classification task in high dimensional space which is (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (non-robustly) by a simple linear separator, (iii) yet is not efficiently robustly learnable, even for small perturbations, by any algorithm in the statistical query (SQ) model. This example gives an exponential separation between classical learning and robust learning in the statistical query model. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms.

Adversarial examples from computational constraints

TL;DR

The work argues that adversarial vulnerability in high-dimensional classifiers may stem from computational constraints rather than fundamental information limits. It proves an information-theoretic robustness possibility with feasible data, and separately shows exponential SQ-model hardness for robust learning on a carefully constructed high-dimensional task, yielding an exponential separation between robust and non-robust learning. A second result demonstrates that robust learning can be sample-efficient under mild generative-model assumptions, even as a simple non-robust classifier remains easy to learn. Together, these findings suggest adversarial examples could be an unavoidable byproduct of learning under computational constraints, guiding future research on robustness and hardness in realistic settings.

Abstract

Why are classifiers in high dimension vulnerable to "adversarial" perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints. First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give a particular classification task where learning a robust classifier is computationally intractable. More precisely we construct a binary classification task in high dimensional space which is (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (non-robustly) by a simple linear separator, (iii) yet is not efficiently robustly learnable, even for small perturbations, by any algorithm in the statistical query (SQ) model. This example gives an exponential separation between classical learning and robust learning in the statistical query model. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms.

Paper Structure

This paper contains 20 sections, 12 theorems, 31 equations, 1 figure.

Key Result

Theorem 1.1

For any $M, \varepsilon > 0$, there exists a classification task in $\mathbb{R}^d$ which is

Figures (1)

  • Figure 1: The distributions in Lemma \ref{['one_dimensional_hard']} are similar to discretized Gaussians, with careful discretization and weighting from Gauss-Hermite quadrature.

Theorems & Definitions (29)

  • Theorem 1.1: informal
  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Remark 2.4
  • Definition 2.5
  • Theorem 3.1
  • proof
  • Definition 3.2
  • Theorem 3.3
  • ...and 19 more