Unlabeled Data Improves Adversarial Robustness
Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi
TL;DR
This work tackles the problem that adversarial robustness historically requires far more labeled data than standard accuracy. It shows that unlabeled data, via self-training, can bridge this gap, both theoretically in a Gaussian model and empirically on CIFAR-10 and SVHN, enabling high robust accuracy with the same order of labeled data. The authors instantiate robust self-training with two robust-loss variants (adversarial training and stability-based smoothing) and demonstrate state-of-the-art or near-state-of-the-art results for both heuristic robustness and certified robustness via randomized smoothing. The findings highlight the practical potential of unlabeled data to enhance robustness and raise important questions about data relevance and the limits of semi-supervision in robustness settings, all with reproducible experiments and code.
Abstract
We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy. Empirically, we augment CIFAR-10 with 500K unlabeled images sourced from 80 Million Tiny Images and use robust self-training to outperform state-of-the-art robust accuracies by over 5 points in (i) $\ell_\infty$ robustness against several strong attacks via adversarial training and (ii) certified $\ell_2$ and $\ell_\infty$ robustness via randomized smoothing. On SVHN, adding the dataset's own extra training set with the labels removed provides gains of 4 to 10 points, within 1 point of the gain from using the extra labels.
