On adversarial training and the 1 Nearest Neighbor classifier
Amir Hagai, Yair Weiss
TL;DR
The paper addresses adversarial vulnerability in classifiers and compares adversarial training with a theoretically grounded 1NN baseline. Using the formal notion of robust accuracy $RA(X,\epsilon,p;f)$, the authors show that the 1NN classifier can achieve $100\%$ robust accuracy on training data for $\epsilon= d_0/2$ when the class distance is $d_0$, and that this robustness extends to test data asymptotically for any $p$. Empirically, 1NN outperforms state-of-the-art adversarial training (TRADES) across 135 binary tasks and 69 RobustBench models, and it remains robust to perturbations that differ from those seen during training. The findings advocate using 1NN as a strong robustness baseline and motivate exploring robustness in alternative feature spaces that retain high accuracy while offering provable resilience to small perturbations.
Abstract
The ability to fool deep learning classifiers with tiny perturbations of the input has lead to the development of adversarial training in which the loss with respect to adversarial examples is minimized in addition to the training examples. While adversarial training improves the robustness of the learned classifiers, the procedure is computationally expensive, sensitive to hyperparameters and may still leave the classifier vulnerable to other types of small perturbations. In this paper we compare the performance of adversarial training to that of the simple 1 Nearest Neighbor (1NN) classifier. We prove that under reasonable assumptions, the 1NN classifier will be robust to {\em any} small image perturbation of the training images. In experiments with 135 different binary image classification problems taken from CIFAR10, MNIST and Fashion-MNIST we find that 1NN outperforms TRADES (a powerful adversarial training algorithm) in terms of average adversarial accuracy. In additional experiments with 69 robust models taken from the current adversarial robustness leaderboard, we find that 1NN outperforms almost all of them in terms of robustness to perturbations that are only slightly different from those used during training. Taken together, our results suggest that modern adversarial training methods still fall short of the robustness of the simple 1NN classifier. our code can be found at \url{https://github.com/amirhagai/On-Adversarial-Training-And-The-1-Nearest-Neighbor-Classifier} \keywords{Adversarial training}
