Table of Contents
Fetching ...

Understanding and Mitigating the Tradeoff Between Robustness and Accuracy

Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, Percy Liang

TL;DR

The paper analyzes why adversarial training can increase standard error even as robustness improves, focusing on noiseless linear regression and minimum-norm interpolation. It shows augmentation introduces an inductive-bias mismatch that can raise standard error, formalizing conditions with the covariance Σ and projection components. To mitigate this, it introduces Robust Self-Training (RST), proves a linear-regression version eliminates the tradeoff by regularizing toward the standard estimator, and provides strong empirical evidence on CIFAR-10 across perturbations and data regimes. The work highlights how unlabeled data can be leveraged to achieve simultaneous gains in robustness and accuracy, with practical implications for deploying robust models in real-world settings.

Abstract

Adversarial training augments the training set with perturbations to improve the robust error (over worst-case perturbations), but it often leads to an increase in the standard error (on unperturbed test inputs). Previous explanations for this tradeoff rely on the assumption that no predictor in the hypothesis class has low standard and robust error. In this work, we precisely characterize the effect of augmentation on the standard error in linear regression when the optimal linear predictor has zero standard and robust error. In particular, we show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor. We then prove that the recently proposed robust self-training (RST) estimator improves robust error without sacrificing standard error for noiseless linear regression. Empirically, for neural networks, we find that RST with different adversarial training methods improves both standard and robust error for random and adversarial rotations and adversarial $\ell_\infty$ perturbations in CIFAR-10.

Understanding and Mitigating the Tradeoff Between Robustness and Accuracy

TL;DR

The paper analyzes why adversarial training can increase standard error even as robustness improves, focusing on noiseless linear regression and minimum-norm interpolation. It shows augmentation introduces an inductive-bias mismatch that can raise standard error, formalizing conditions with the covariance Σ and projection components. To mitigate this, it introduces Robust Self-Training (RST), proves a linear-regression version eliminates the tradeoff by regularizing toward the standard estimator, and provides strong empirical evidence on CIFAR-10 across perturbations and data regimes. The work highlights how unlabeled data can be leveraged to achieve simultaneous gains in robustness and accuracy, with practical implications for deploying robust models in real-world settings.

Abstract

Adversarial training augments the training set with perturbations to improve the robust error (over worst-case perturbations), but it often leads to an increase in the standard error (on unperturbed test inputs). Previous explanations for this tradeoff rely on the assumption that no predictor in the hypothesis class has low standard and robust error. In this work, we precisely characterize the effect of augmentation on the standard error in linear regression when the optimal linear predictor has zero standard and robust error. In particular, we show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor. We then prove that the recently proposed robust self-training (RST) estimator improves robust error without sacrificing standard error for noiseless linear regression. Empirically, for neural networks, we find that RST with different adversarial training methods improves both standard and robust error for random and adversarial rotations and adversarial perturbations in CIFAR-10.

Paper Structure

This paper contains 67 sections, 10 theorems, 71 equations, 9 figures, 3 tables.

Key Result

Theorem 1

The difference in the standard errors of the standard estimator $\hat{\theta}_\text{std}$ and augmented estimator $\hat{\theta}_\text{aug}$ can be written as follows. where $v = \Pi_{\text{std}}^\perp \Pi_{\text{aug}} \theta^{\star}$ and $w = \Pi_{\text{aug}}^\perp \theta^{\star}$.

Figures (9)

  • Figure 1: Gap between the standard error of adversarial trainning madry2018towards with $\ell_\infty$ perturbations, and standard training. The gap decreases with increase in training set size, suggesting that the tradeoff between standard and robust error should disappear with infinite data.
  • Figure 2: We consider function interpolation via cubic splines. (Left) The underlying distribution $P_\mathsf{x}$ denoted by sizes of the circles. The true function is a staircase. (Middle) With a small number of standard training samples (purple circles), an augmented estimator that fits local perturbations (green crosses) has a large error. In constrast, the standard estimator that does not fit perturbations is a simple straight line and has small error. (Right) Robust self-training (RST) regularizes the predictions of an augmented estimator towards the predictions of the standard estimator thereby obtaining both small error on test points and their perturbations.
  • Figure 3: Illustration of the 3-D example described in Sec. \ref{['sec:simple-3D']}. (a)-(b) Effect of augmentation on parameter error for different $\theta^{\star}$. We show the projections of the standard estimator $\hat{\theta}_\text{std}$ (blue circle), augmented estimator $\hat{\theta}_\text{aug}$ (orange arrow), and true parameters $\theta^{\star}$ (black arrow) on $\text{Null}(X_\text{std})$, spanned by $e_1$ and $e_2$. For simplicity of presentation, we omit the projection operator in the figure labels. Depending on $\theta^{\star}$, the parameter error of $\hat{\theta}_\text{aug}$ along $e_2$ could be larger or smaller than the parameter error of $\hat{\theta}_\text{std}$ along $e_2$. (c)--(d) Dependence of space of safe augmentations on $\Sigma$. Visualization of the space of extra data points $x_{\text{ext}}$ (orange), that do not cause an increase in the standard error for the illustrated $\theta^{\star}$ (black vector), as result of Theorem \ref{['thm:main']}.
  • Figure 4: Top 4 eigenvectors of $\Sigma$ in the splines problem (from Figure \ref{['fig:spline']}), representing wave functions in the input space. The "global" eigenfunctions, varying less over the domain, correspond to larger eigenvalues, making errors in global dimensions costly in terms of test error.
  • Figure 5: Illustration shows the four components of the RST loss (Equation \ref{['eqn:general-x']}) in the special case of linear regression (Eq. \ref{['eqn:linear-x']}). Green cells contain hard constraints where the optimal $\theta^\star$ obtains zero loss. The orange cell contains the soft constraint that is minimized while satisfying hard constraints to obtain the final linear RST estimator.
  • ...and 4 more figures

Theorems & Definitions (13)

  • Theorem 1
  • Corollary 1
  • Proposition 1
  • Theorem 2
  • Lemma 1
  • Lemma 2
  • Corollary 2
  • Theorem 3
  • proof
  • Theorem 3
  • ...and 3 more