On adversarial training and the 1 Nearest Neighbor classifier

Amir Hagai; Yair Weiss

On adversarial training and the 1 Nearest Neighbor classifier

Amir Hagai, Yair Weiss

TL;DR

The paper addresses adversarial vulnerability in classifiers and compares adversarial training with a theoretically grounded 1NN baseline. Using the formal notion of robust accuracy $RA(X,\epsilon,p;f)$, the authors show that the 1NN classifier can achieve $100\%$ robust accuracy on training data for $\epsilon= d_0/2$ when the class distance is $d_0$, and that this robustness extends to test data asymptotically for any $p$. Empirically, 1NN outperforms state-of-the-art adversarial training (TRADES) across 135 binary tasks and 69 RobustBench models, and it remains robust to perturbations that differ from those seen during training. The findings advocate using 1NN as a strong robustness baseline and motivate exploring robustness in alternative feature spaces that retain high accuracy while offering provable resilience to small perturbations.

Abstract

The ability to fool deep learning classifiers with tiny perturbations of the input has lead to the development of adversarial training in which the loss with respect to adversarial examples is minimized in addition to the training examples. While adversarial training improves the robustness of the learned classifiers, the procedure is computationally expensive, sensitive to hyperparameters and may still leave the classifier vulnerable to other types of small perturbations. In this paper we compare the performance of adversarial training to that of the simple 1 Nearest Neighbor (1NN) classifier. We prove that under reasonable assumptions, the 1NN classifier will be robust to {\em any} small image perturbation of the training images. In experiments with 135 different binary image classification problems taken from CIFAR10, MNIST and Fashion-MNIST we find that 1NN outperforms TRADES (a powerful adversarial training algorithm) in terms of average adversarial accuracy. In additional experiments with 69 robust models taken from the current adversarial robustness leaderboard, we find that 1NN outperforms almost all of them in terms of robustness to perturbations that are only slightly different from those used during training. Taken together, our results suggest that modern adversarial training methods still fall short of the robustness of the simple 1NN classifier. our code can be found at \url{https://github.com/amirhagai/On-Adversarial-Training-And-The-1-Nearest-Neighbor-Classifier} \keywords{Adversarial training}

On adversarial training and the 1 Nearest Neighbor classifier

TL;DR

The paper addresses adversarial vulnerability in classifiers and compares adversarial training with a theoretically grounded 1NN baseline. Using the formal notion of robust accuracy

, the authors show that the 1NN classifier can achieve

robust accuracy on training data for

when the class distance is

, and that this robustness extends to test data asymptotically for any

. Empirically, 1NN outperforms state-of-the-art adversarial training (TRADES) across 135 binary tasks and 69 RobustBench models, and it remains robust to perturbations that differ from those seen during training. The findings advocate using 1NN as a strong robustness baseline and motivate exploring robustness in alternative feature spaces that retain high accuracy while offering provable resilience to small perturbations.

Abstract

Paper Structure (5 sections, 6 theorems, 5 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 5 sections, 6 theorems, 5 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
Adversarial Robustness and Adversarial Training
The Robust Accuracy of 1NN
Experiments
Discussion

Key Result

theorem thmcountertheorem

Denote by $X_c$ a set of examples that are $\delta$ confidently classified by the 1NN classifier. robust accuracy of the 1NN classifier on this set with $\ell_2$ norm, $\epsilon = \frac{\delta}{2}$ is $100\%$:

Figures (6)

Figure 1: Adversarial images for CIFAR10 training images for a CNN classifier obtained with adversarial training. First row is the original image, second row is the adversarial image and last row is the normalized difference. Even though the model was trained to be robust to any perturbation in which all pixel values are less than $8/255$, it can be easily fooled by imperceptible perturbations that are only slightly different from those that it saw during training.
Figure 2: The definition of robust accuracy depends on a norm $p$ and a radius $\epsilon$. A robust classifier needs to correctly classify all examples $x_i$ as well as all examples $x'$ that are in an $\epsilon$ ball around $x_i$. In the figure, the linear classifier denoted by the solid line has 100% robust accuracy but the dashed line does not. In this paper we show that adversarial training produces classifiers that are robust to the specific values of $p,\epsilon$ that were used during training while 1NN is robust to any $p$ with sufficiently small $\epsilon$.
Figure 3: The decision boundary of TRADES (left) and 1NN on a simple 2D classification problem. The black square denotes the $\epsilon$ ball that defines adversarial accuracy (this was the $\epsilon$ ball used when the neural network was trained using TRADES). TRADES succeeds in making the decision boundary constant within the $\epsilon$ ball for some examples but not all. In contrast, the 1NN classifier achieves high robustness to any small perturbation of the training examples.
Figure 4: Illustration of the proof of theorem \ref{['thm:thm2']}. If the distance to the closest point in the correct class is $d$ and the closest point in the incorrect class is $d+\delta$ then the 1NN classifier will be robust to any perturbation smaller than $\delta/2$.
Figure 5: Adversarial images for 1NN for the same CIFAR10 training images shown in \ref{['fig:three_figures']}. First row is the original image, second row is the adversary image and last line is the normalized difference. As can be seen in the histogram of pixel differences on top, to fool the 1NN nearest neighbor classifier, almost all pixels should be changed by over $8/255$.
...and 1 more figures

Theorems & Definitions (12)

definition thmcounterdefinition
definition thmcounterdefinition
definition thmcounterdefinition
definition thmcounterdefinition
theorem thmcountertheorem
proof
theorem thmcountertheorem
theorem thmcountertheorem
corollary thmcountercorollary
corollary thmcountercorollary
...and 2 more

On adversarial training and the 1 Nearest Neighbor classifier

TL;DR

Abstract

On adversarial training and the 1 Nearest Neighbor classifier

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (12)