Table of Contents
Fetching ...

A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples

Thomas Tanay, Lewis Griffin

TL;DR

Adversarial examples challenge robustness of deep nets; the authors critique the 'too linear' view and propose a boundary-tilting framework to explain them. They develop a linear analysis and introduce a deviation-angle measure relative to a nearest-centroid boundary. They show boundary tilting can produce arbitrarily strong adversarial examples without hurting accuracy, and introduce a taxonomy of adversarial types tied to regularization. The work connects practical vulnerability to data geometry and regularization, offering a path toward mitigating type-2 adversarial examples and understanding why deep nets remain susceptible despite strong performance.

Abstract

Deep neural networks have been shown to suffer from a surprising weakness: their classification outputs can be changed by small, non-random perturbations of their inputs. This adversarial example phenomenon has been explained as originating from deep networks being "too linear" (Goodfellow et al., 2014). We show here that the linear explanation of adversarial examples presents a number of limitations: the formal argument is not convincing, linear classifiers do not always suffer from the phenomenon, and when they do their adversarial examples are different from the ones affecting deep networks. We propose a new perspective on the phenomenon. We argue that adversarial examples exist when the classification boundary lies close to the submanifold of sampled data, and present a mathematical analysis of this new perspective in the linear case. We define the notion of adversarial strength and show that it can be reduced to the deviation angle between the classifier considered and the nearest centroid classifier. Then, we show that the adversarial strength can be made arbitrarily high independently of the classification performance due to a mechanism that we call boundary tilting. This result leads us to defining a new taxonomy of adversarial examples. Finally, we show that the adversarial strength observed in practice is directly dependent on the level of regularisation used and the strongest adversarial examples, symptomatic of overfitting, can be avoided by using a proper level of regularisation.

A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples

TL;DR

Adversarial examples challenge robustness of deep nets; the authors critique the 'too linear' view and propose a boundary-tilting framework to explain them. They develop a linear analysis and introduce a deviation-angle measure relative to a nearest-centroid boundary. They show boundary tilting can produce arbitrarily strong adversarial examples without hurting accuracy, and introduce a taxonomy of adversarial types tied to regularization. The work connects practical vulnerability to data geometry and regularization, offering a path toward mitigating type-2 adversarial examples and understanding why deep nets remain susceptible despite strong performance.

Abstract

Deep neural networks have been shown to suffer from a surprising weakness: their classification outputs can be changed by small, non-random perturbations of their inputs. This adversarial example phenomenon has been explained as originating from deep networks being "too linear" (Goodfellow et al., 2014). We show here that the linear explanation of adversarial examples presents a number of limitations: the formal argument is not convincing, linear classifiers do not always suffer from the phenomenon, and when they do their adversarial examples are different from the ones affecting deep networks. We propose a new perspective on the phenomenon. We argue that adversarial examples exist when the classification boundary lies close to the submanifold of sampled data, and present a mathematical analysis of this new perspective in the linear case. We define the notion of adversarial strength and show that it can be reduced to the deviation angle between the classifier considered and the nearest centroid classifier. Then, we show that the adversarial strength can be made arbitrarily high independently of the classification performance due to a mechanism that we call boundary tilting. This result leads us to defining a new taxonomy of adversarial examples. Finally, we show that the adversarial strength observed in practice is directly dependent on the level of regularisation used and the strongest adversarial examples, symptomatic of overfitting, can be avoided by using a proper level of regularisation.

Paper Structure

This paper contains 20 sections, 16 equations, 18 figures.

Figures (18)

  • Figure 1: Adversarial examples for two different models (from goodfellow2014explaining).
  • Figure 2: Increasing the dimensionality of the problem does not make the phenomenon of adversarial examples worse. Whether the image size is $28 \times 28$ or $200 \times 200$, the weight vector found by linear SVM looks very similar to the one found by logistic regression in goodfellow2014explaining. The two SVM models have an error rate of $2.7\%$. The magnitude $\epsilon$ of the perturbations has been chosen in both cases such that $99\%$ of the digits in the test set are misclassified ($\epsilon_{28} = 4.6, \epsilon_{200} = 31. \approx \epsilon_{28} \times 200/28$)
  • Figure 3: Toy problem of two classes $I$ and $J$ that do not suffer from the phenomenon of adversarial examples. When we follow the procedure that normally leads to the creation of adversarial examples, we get instead real instances of images that belong to the other class. We call the images on the boundary the projected images and the images with opposed classification score the mirror images.
  • Figure 4: The weight vectors found by linear models resemble the average 3 of the MNIST training data to which the average 7 has been subtracted.
  • Figure 5: Schematic representations of two solutions to the adversarial examples paradox.
  • ...and 13 more figures