Table of Contents
Fetching ...

Group-based Robustness: A General Framework for Customized Robustness in the Real World

Weiran Lin, Keane Lucas, Neo Eyal, Lujo Bauer, Michael K. Reiter, Mahmood Sharif

TL;DR

This work introduces group-based robustness (GBR), a metric that evaluates a model’s resistance to group-level misclassifications from a source class set $S$ to a disjoint target set $T$, addressing real-world threats that traditional benign accuracy, untargeted robustness, and targeted robustness miss. It formalizes the GBR framework, demonstrates that existing metrics can be orthogonal to GBR, and proposes two efficient loss functions (MDMAX and MDMUL) alongside three attack strategies to compute GBR more quickly. The authors also present a defense built on tailored adversarial training (MDTRAIN) and data-fetching strategies that achieve up to $3.52\times$ improvements in GBR while maintaining benign accuracy. The results span multiple domains (GTSRB, PubFig, ImageNet, SST-5) and show that GBR reveals threat sensitivities not captured by conventional metrics, guiding safer, more practical robustness evaluations and defenses.

Abstract

Machine-learning models are known to be vulnerable to evasion attacks that perturb model inputs to induce misclassifications. In this work, we identify real-world scenarios where the true threat cannot be assessed accurately by existing attacks. Specifically, we find that conventional metrics measuring targeted and untargeted robustness do not appropriately reflect a model's ability to withstand attacks from one set of source classes to another set of target classes. To address the shortcomings of existing methods, we formally define a new metric, termed group-based robustness, that complements existing metrics and is better-suited for evaluating model performance in certain attack scenarios. We show empirically that group-based robustness allows us to distinguish between models' vulnerability against specific threat models in situations where traditional robustness metrics do not apply. Moreover, to measure group-based robustness efficiently and accurately, we 1) propose two loss functions and 2) identify three new attack strategies. We show empirically that with comparable success rates, finding evasive samples using our new loss functions saves computation by a factor as large as the number of targeted classes, and finding evasive samples using our new attack strategies saves time by up to 99\% compared to brute-force search methods. Finally, we propose a defense method that increases group-based robustness by up to 3.52$\times$.

Group-based Robustness: A General Framework for Customized Robustness in the Real World

TL;DR

This work introduces group-based robustness (GBR), a metric that evaluates a model’s resistance to group-level misclassifications from a source class set to a disjoint target set , addressing real-world threats that traditional benign accuracy, untargeted robustness, and targeted robustness miss. It formalizes the GBR framework, demonstrates that existing metrics can be orthogonal to GBR, and proposes two efficient loss functions (MDMAX and MDMUL) alongside three attack strategies to compute GBR more quickly. The authors also present a defense built on tailored adversarial training (MDTRAIN) and data-fetching strategies that achieve up to improvements in GBR while maintaining benign accuracy. The results span multiple domains (GTSRB, PubFig, ImageNet, SST-5) and show that GBR reveals threat sensitivities not captured by conventional metrics, guiding safer, more practical robustness evaluations and defenses.

Abstract

Machine-learning models are known to be vulnerable to evasion attacks that perturb model inputs to induce misclassifications. In this work, we identify real-world scenarios where the true threat cannot be assessed accurately by existing attacks. Specifically, we find that conventional metrics measuring targeted and untargeted robustness do not appropriately reflect a model's ability to withstand attacks from one set of source classes to another set of target classes. To address the shortcomings of existing methods, we formally define a new metric, termed group-based robustness, that complements existing metrics and is better-suited for evaluating model performance in certain attack scenarios. We show empirically that group-based robustness allows us to distinguish between models' vulnerability against specific threat models in situations where traditional robustness metrics do not apply. Moreover, to measure group-based robustness efficiently and accurately, we 1) propose two loss functions and 2) identify three new attack strategies. We show empirically that with comparable success rates, finding evasive samples using our new loss functions saves computation by a factor as large as the number of targeted classes, and finding evasive samples using our new attack strategies saves time by up to 99\% compared to brute-force search methods. Finally, we propose a defense method that increases group-based robustness by up to 3.52.
Paper Structure (49 sections, 7 equations, 17 figures)

This paper contains 49 sections, 7 equations, 17 figures.

Figures (17)

  • Figure 1: Traffic signs from GTSRB IJCNN13:GTSRB. The left three columns are speed-limit and delimit signs (i.e., ones that restrict speed limit or mark the end of restrictions). The rightmost column includes three signs that signify an immediate stop: no vehicles, no entry, and stop (from top to bottom).
  • Figure 2: Performance of models on GTSRB measured by four metrics: accuracy, untargeted robustness, targeted robustness, and group-based robustness. With each combination of $L_p$-norm and architecture, the distribution of group-based robustness, depicted as the wider boxes, is different from those of the other three metrics. With each combination, the performance of models varies only due to different randomly initialized weights, using seeds 0 -- 99.
  • Figure 3: Pearson correlation coefficients between group-based robustness and three existing metrics: accuracy, untargeted robustness, and targeted robustness on GTSRB. Across most of the combinations of model architecture and $L_p$-norm, the correlations are negligible or weak as the coefficients have a magnitude smaller than 0.4 AA18:correlation.
  • Figure 4: Performance of models measured by accuracy, untargeted robustness (UR), targeted robustness (TR), and group-based robustness (GBR). These models were adversarially trained with PubFig and ImageNet. On each model, GBR has a wide range due to different choices of $T\xspace$ and $S\xspace$, whereas the other metrics report only a single value that is sometimes out of the GBR range.
  • Figure 5: Robustness as defined by different metrics on the SST-5 dataset. The USE score, as used by T-PGD, denotes the imperceptibility of the perturbation: the higher the USE score is, the more imperceptible the perturbation is to humans. Group-based robustness has a wide range, whereas untargeted robustness and targeted robustness are sometimes out of the range.
  • ...and 12 more figures