Table of Contents
Fetching ...

Theoretically Principled Trade-off between Robustness and Accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael I. Jordan

TL;DR

This work tackles the fundamental problem of balancing adversarial robustness with natural accuracy in classification. It develops a theoretical framework that decomposes robust error into natural error and boundary error and derives tight, differentiable upper bounds using classification-calibrated surrogate losses. Guided by this theory, it introduces TRADES, a two-term objective that trades off accuracy against boundary-induced robustness and demonstrates strong empirical performance on MNIST, CIFAR-10, and a NeurIPS 2018 contest. The work provides a principled, scalable approach to robust learning and clarifies the trade-offs that govern defense design.

Abstract

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41\%$ in terms of mean $\ell_2$ perturbation distance.

Theoretically Principled Trade-off between Robustness and Accuracy

TL;DR

This work tackles the fundamental problem of balancing adversarial robustness with natural accuracy in classification. It develops a theoretical framework that decomposes robust error into natural error and boundary error and derives tight, differentiable upper bounds using classification-calibrated surrogate losses. Guided by this theory, it introduces TRADES, a two-term objective that trades off accuracy against boundary-induced robustness and demonstrates strong empirical performance on MNIST, CIFAR-10, and a NeurIPS 2018 contest. The work provides a principled, scalable approach to robust learning and clarifies the trade-offs that govern defense design.

Abstract

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by in terms of mean perturbation distance.

Paper Structure

This paper contains 38 sections, 9 theorems, 40 equations, 8 figures, 10 tables, 2 algorithms.

Key Result

Lemma 2.1

Under Assumption assumption: classification-calibrated, the function $\psi$ has the following properties: $\psi$ is non-decreasing, continuous, convex on $[0,1]$ and $\psi(0) = 0$.

Figures (8)

  • Figure 1: Left figure: decision boundary learned by natural training method. Right figure: decision boundary learned by our adversarial training method, where the orange dotted line represents the decision boundary in the left figure. It shows that both methods achieve zero natural training error, while our adversarial training method achieves better robust training error than the natural training method.
  • Figure 2: Counterexample given by Eqn. \ref{['equ: counterexample']}.
  • Figure 3: Top-6 results (out of 2,000 submissions) in the NeurIPS 2018 Adversarial Vision Challenge. The vertical axis represents the mean $\ell_2$ perturbation distance that makes robust models fail to output correct labels.
  • Figure 4: Left figure: boundary neighborhood of linear classifier. Right figure: boundary neighborhood of non-linear classifier. Theorem \ref{['theorem: vulnerability']} shows that the mass of $S_{\text{linear}}$ is smaller than the mass of $S_{\text{non-linear}}$, provided that the underlying distribution over the instance space is the products of log-concave distribution on the real line.
  • Figure 5: Adversarial examples on MNIST dataset. In each subfigure, the image in the first row is the original image and we list the corresponding correct label beneath the image. We show the perturbed images in the second row. The differences between the perturbed images and the original images, i.e., the perturbations, are shown in the third row. In each column, the perturbed image and the perturbation are generated by FGSM$^{k}$ (white-box) attack on the model listed below. The labels beneath the perturbed images are the predictions of the corresponding models, which are different from the correct labels. We record the smallest perturbations in terms of $\ell_\infty$ norm that make the models predict a wrong label.
  • ...and 3 more figures

Theorems & Definitions (14)

  • Lemma 2.1: bartlett2006convexity
  • Theorem 3.1
  • Theorem 3.2
  • proof
  • proof
  • Theorem C.1
  • Lemma C.2: Theorem 9, barthe2001extremal
  • proof
  • Lemma C.3: Corollary 1, zhang2002covering
  • Lemma C.4: Theorem 4, zhang2002covering
  • ...and 4 more