Table of Contents
Fetching ...

Unlabeled Data Improves Adversarial Robustness

Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi

TL;DR

This work tackles the problem that adversarial robustness historically requires far more labeled data than standard accuracy. It shows that unlabeled data, via self-training, can bridge this gap, both theoretically in a Gaussian model and empirically on CIFAR-10 and SVHN, enabling high robust accuracy with the same order of labeled data. The authors instantiate robust self-training with two robust-loss variants (adversarial training and stability-based smoothing) and demonstrate state-of-the-art or near-state-of-the-art results for both heuristic robustness and certified robustness via randomized smoothing. The findings highlight the practical potential of unlabeled data to enhance robustness and raise important questions about data relevance and the limits of semi-supervision in robustness settings, all with reproducible experiments and code.

Abstract

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy. Empirically, we augment CIFAR-10 with 500K unlabeled images sourced from 80 Million Tiny Images and use robust self-training to outperform state-of-the-art robust accuracies by over 5 points in (i) $\ell_\infty$ robustness against several strong attacks via adversarial training and (ii) certified $\ell_2$ and $\ell_\infty$ robustness via randomized smoothing. On SVHN, adding the dataset's own extra training set with the labels removed provides gains of 4 to 10 points, within 1 point of the gain from using the extra labels.

Unlabeled Data Improves Adversarial Robustness

TL;DR

This work tackles the problem that adversarial robustness historically requires far more labeled data than standard accuracy. It shows that unlabeled data, via self-training, can bridge this gap, both theoretically in a Gaussian model and empirically on CIFAR-10 and SVHN, enabling high robust accuracy with the same order of labeled data. The authors instantiate robust self-training with two robust-loss variants (adversarial training and stability-based smoothing) and demonstrate state-of-the-art or near-state-of-the-art results for both heuristic robustness and certified robustness via randomized smoothing. The findings highlight the practical potential of unlabeled data to enhance robustness and raise important questions about data relevance and the limits of semi-supervision in robustness settings, all with reproducible experiments and code.

Abstract

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy. Empirically, we augment CIFAR-10 with 500K unlabeled images sourced from 80 Million Tiny Images and use robust self-training to outperform state-of-the-art robust accuracies by over 5 points in (i) robustness against several strong attacks via adversarial training and (ii) certified and robustness via randomized smoothing. On SVHN, adding the dataset's own extra training set with the labels removed provides gains of 4 to 10 points, within 1 point of the gain from using the extra labels.

Paper Structure

This paper contains 95 sections, 11 theorems, 98 equations, 15 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

There exists a universal constant $r$ such that for all $\epsilon^2 \sqrt{d/n_0} \ge r$,

Figures (15)

  • Figure 1: Certified defense. Guaranteed CIFAR-10 test accuracy under all $\ell_2$ and $\ell_\infty$ attacks. Stability-based robust self-training with 500K unlabeled Tiny Images ($\texttt{RST}_\texttt{stab}(\texttt{50K+500K})$) outperforms stability training with only labeled data ($\texttt{Baseline}_\texttt{stab}(\texttt{50K})$). (a) Accuracy vs. $\ell_2$ radius, certified via randomized smoothing cohen2019certified. Shaded regions indicate variation across 3 runs. Accuracy at $\ell_2$ radius 0.435 implies accuracy at $\ell_\infty$ radius 2/255. (b) The implied $\ell_\infty$ certified accuracy is comparable to the state-of-the-art in methods that directly target $\ell_\infty$ robustness.
  • Figure 2: SVHN test accuracy for robust training without the extra data, with unlabeled extra (self-training), and with the labeled extra data. Left: Adversarial training and accuracies under $\ell_\infty$ attack with $\epsilon=4/255$. Right: Stability training and certified $\ell_2$ accuracies as a function of perturbation radius. Most of the gains from extra data comes from the unlabeled inputs.
  • Figure 3: Comparison of training traces for different hyperpameters of adversarial training. Dashed lines show standard accuracy on the entire CIFAR-10 test set, and whole lines show robust accuracy against $\texttt{PG}_\texttt{TRADES}$ evaluated on the first 500 images in the CIFAR-10 test set.
  • Figure 4: Certified accuracy as a function of $\ell_2$ perturbation radius, comparing two architectures, two hyperparameter sets and two training objectives. (a) Both stability training and our hyperparameter choice improve performance. (b) Increasing model capacity improves performance further, and stability training remains beneficial.
  • Figure 5: Random images from the 80 Million Tiny Images data. (a) Images drawn from the entire dataset. (b) Images drawn for the subset with keywords that appeared in CIFAR-10; matching keywords correlate only weakly with membership in one of the CIFAR-10 classes.
  • ...and 10 more figures

Theorems & Definitions (17)

  • Proposition 1
  • Theorem 1: schmidt2018adversarially
  • Theorem 2
  • Lemma 1
  • proof
  • Proposition 1
  • proof
  • Theorem 2: schmidt2018adversarially
  • proof
  • Lemma 2
  • ...and 7 more