Expressive Losses for Verified Robustness via Convex Combinations
Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Robert Stanforth, Alessio Lomuscio
TL;DR
This work tackles the gap between empirical adversarial robustness and formal verifiability by introducing the concept of loss expressivity: a family of losses parameterized by $\alpha \in [0,1]$ that interpolates between the adversarial loss and a verifiable loss. It shows that simple convex-combination instantiations (CC-IBP, MTL-IBP, Exp-IBP) can achieve state-of-the-art robustness–accuracy trade-offs across multiple vision benchmarks, supporting the claim that expressivity is the key driver of performance. The authors connect expressivity to existing methods like SABR and provide extensive experiments demonstrating the nuanced role of the over-approximation coefficient, including when better worst-case approximations do not guarantee better results. Code and pseudo-code are released to enable reproducibility and further exploration of expressive losses.
Abstract
In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
