Table of Contents
Fetching ...

Expressive Losses for Verified Robustness via Convex Combinations

Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Robert Stanforth, Alessio Lomuscio

TL;DR

This work tackles the gap between empirical adversarial robustness and formal verifiability by introducing the concept of loss expressivity: a family of losses parameterized by $\alpha \in [0,1]$ that interpolates between the adversarial loss and a verifiable loss. It shows that simple convex-combination instantiations (CC-IBP, MTL-IBP, Exp-IBP) can achieve state-of-the-art robustness–accuracy trade-offs across multiple vision benchmarks, supporting the claim that expressivity is the key driver of performance. The authors connect expressivity to existing methods like SABR and provide extensive experiments demonstrating the nuanced role of the over-approximation coefficient, including when better worst-case approximations do not guarantee better results. Code and pseudo-code are released to enable reproducibility and further exploration of expressive losses.

Abstract

In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.

Expressive Losses for Verified Robustness via Convex Combinations

TL;DR

This work tackles the gap between empirical adversarial robustness and formal verifiability by introducing the concept of loss expressivity: a family of losses parameterized by that interpolates between the adversarial loss and a verifiable loss. It shows that simple convex-combination instantiations (CC-IBP, MTL-IBP, Exp-IBP) can achieve state-of-the-art robustness–accuracy trade-offs across multiple vision benchmarks, supporting the claim that expressivity is the key driver of performance. The authors connect expressivity to existing methods like SABR and provide extensive experiments demonstrating the nuanced role of the over-approximation coefficient, including when better worst-case approximations do not guarantee better results. Code and pseudo-code are released to enable reproducibility and further exploration of expressive losses.

Abstract

In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
Paper Structure (51 sections, 11 theorems, 25 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 51 sections, 11 theorems, 25 equations, 4 figures, 11 tables, 1 algorithm.

Key Result

Proposition 4.0

If $\mathcal{L}(\cdot, y)$ is continuous with respect to its first argument, the parametrized loss $\mathcal{L}_{\alpha, \text{CC}} (\bm{\theta}, \mathbf{x}, y)$ is expressive according to definition def:expressive.

Figures (4)

  • Figure 1: Sensitivity of CC-IBP, MTL-IBP and Exp-IBP to the convex combination coefficient $\alpha$. We report standard, adversarial and verified robust accuracies (with different verifiers) under $\ell_\infty$ perturbations on the first $1000$ images of the CIFAR-10 test set. The legend in plot \ref{['fig:alpha-sensitivity-mtlibp-2']} applies to all sub-figures.
  • Figure 2: Relationship between the over-approximation coefficient $\alpha$ and the standard loss, the branch-and-bound loss, its approximation error and RMSE (relative to the expressive loss employed during training) on a holdout validation set of CIFAR-10. While the standard loss is computed on the entire validation set, branch-and-bound-related statistics are limited to the first $100$ images. The dashed vertical line denotes the $\alpha$ value minimizing the sum of the standard and branch-and-bound losses. The legend in plot \ref{['fig:2-255-tuning-mtlibp']} applies to all sub-figures.
  • Figure 3: IBP ReLU over-approximation, with $\hat{x} \in [\hat{l}, \hat{u}]$ for the considered input domain.
  • Figure 4: Loss values, computed on the full CIFAR-10 test set, for the models from Figure \ref{['fig:alpha-sensitivity']}. The CC-IBP, MTL-IBP and Exp-IBP losses (see $\S$\ref{['sec:convexcombinations']}) are computed using the $\alpha$ value employed for training, denoted on the $x$ axis. The legend in plot \ref{['fig:alpha-sensitivity-loss-mtlibp-2']} applies to all sub-figures.

Theorems & Definitions (20)

  • Definition 3.1
  • Proposition 4.0
  • Proposition 4.0
  • Proposition 4.0
  • Proposition B.0
  • proof
  • Proposition B.0
  • proof
  • Proposition B.0
  • proof
  • ...and 10 more