The Adversarial Consistency of Surrogate Risks for Binary Classification
Natalie Frank, Jonathan Niles-Weed
TL;DR
This work provides a complete characterization of adversarially consistent surrogate losses for robust binary classification under $\epsilon$-ball perturbations. It identifies a precise necessary-and-sufficient condition, $C_\phi^*(\tfrac{1}{2}) < \phi(0)$, under which a surrogate is adversarially consistent, and demonstrates that common convex losses fail while the $\rho$-margin loss and shifted sigmoid satisfy. The authors leverage minimax duality with $W_\infty$ perturbation sets to relate adversarial risks to dual objectives, and they prove a quantitative excess-risk bound for the $\rho$-margin loss, showing that minimizing its adversarial surrogate risk effectively reduces the adversarial classification error. These results guide the design of surrogate losses for robust learning and lay groundwork for extending the theory to other perturbation models and loss families.
Abstract
We study the consistency of surrogate risks for robust binary classification. It is common to learn robust classifiers by adversarial training, which seeks to minimize the expected $0$-$1$ loss when each example can be maliciously corrupted within a small ball. We give a simple and complete characterization of the set of surrogate loss functions that are \emph{consistent}, i.e., that can replace the $0$-$1$ loss without affecting the minimizing sequences of the original adversarial risk, for any data distribution. We also prove a quantitative version of adversarial consistency for the $ρ$-margin loss. Our results reveal that the class of adversarially consistent surrogates is substantially smaller than in the standard setting, where many common surrogates are known to be consistent.
