A Finer Calibration Analysis for Adversarial Robustness
Pranjal Awasthi, Anqi Mao, Mehryar Mohri, Yutao Zhong
TL;DR
The paper tackles ${\mathscr H}$-calibration for adversarially robust binary classification by clarifying non-uniform versus uniform calibration and correcting prior claims about ${\mathscr H}$-consistency. It shows that common convex surrogate losses fail ${\mathscr H}$-calibration with respect to the adversarial loss $\ell_{\gamma}$, while carefully constructed non-convex margin-based and $\rho$-margin losses can be calibrated and, under realizability, are ${\mathscr H}$-consistent. The results hold for broad hypothesis sets, including generalized linear models and ReLU networks, extending and strengthening previous work (Bao 2020; PNAMY 2021) and removing restrictive unboundedness assumptions. Together, these findings guide the design of theoretically sound surrogate losses for adversarial robustness across a wide range of models and settings.
Abstract
We present a more general analysis of $H$-calibration for adversarially robust classification. By adopting a finer definition of calibration, we can cover settings beyond the restricted hypothesis sets studied in previous work. In particular, our results hold for most common hypothesis sets used in machine learning. We both fix some previous calibration results (Bao et al., 2020) and generalize others (Awasthi et al., 2021). Moreover, our calibration results, combined with the previous study of consistency by Awasthi et al. (2021), also lead to more general $H$-consistency results covering common hypothesis sets.
