Theoretically Principled Trade-off between Robustness and Accuracy
Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael I. Jordan
TL;DR
This work tackles the fundamental problem of balancing adversarial robustness with natural accuracy in classification. It develops a theoretical framework that decomposes robust error into natural error and boundary error and derives tight, differentiable upper bounds using classification-calibrated surrogate losses. Guided by this theory, it introduces TRADES, a two-term objective that trades off accuracy against boundary-induced robustness and demonstrates strong empirical performance on MNIST, CIFAR-10, and a NeurIPS 2018 contest. The work provides a principled, scalable approach to robust learning and clarifies the trade-offs that govern defense design.
Abstract
We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41\%$ in terms of mean $\ell_2$ perturbation distance.
