Table of Contents
Fetching ...

OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift

Lin Li, Yifei Wang, Chawin Sitawarin, Michael Spratling

TL;DR

OODRobustBench introduces a comprehensive benchmark to evaluate adversarial robustness under distribution shift, pairing 23 dataset shifts with 6 threat shifts across a diverse model zoo of 706 robust models and 60.7K evaluations. The study reveals a pronounced OOD generalization gap and a strong, often linear, relationship between ID and OOD robustness, enabling predictions of OOD performance from ID metrics but suggesting that current methods are unlikely to achieve high OOD robustness without new approaches. Key insights include that AT strengthens ID–OOD correlations, that robustness degrades under shifts with notable outliers, and that novel interventions—potentially combining adversarial training with dedicated OOD techniques—are needed. The work provides a practical platform and dataset for measuring OOD robustness and highlights directions for developing models that remain reliable when deployed in the wild.

Abstract

Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This omission is concerning as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness in a positive linear way. The latter enables the prediction of OOD robustness from ID robustness. We then predict and verify that existing methods are unlikely to achieve high OOD robustness. Novel methods are therefore required to achieve OOD robustness beyond our prediction. To facilitate the development of these methods, we investigate a wide range of techniques and identify several promising directions. Code and models are available at: https://github.com/OODRobustBench/OODRobustBench.

OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift

TL;DR

OODRobustBench introduces a comprehensive benchmark to evaluate adversarial robustness under distribution shift, pairing 23 dataset shifts with 6 threat shifts across a diverse model zoo of 706 robust models and 60.7K evaluations. The study reveals a pronounced OOD generalization gap and a strong, often linear, relationship between ID and OOD robustness, enabling predictions of OOD performance from ID metrics but suggesting that current methods are unlikely to achieve high OOD robustness without new approaches. Key insights include that AT strengthens ID–OOD correlations, that robustness degrades under shifts with notable outliers, and that novel interventions—potentially combining adversarial training with dedicated OOD techniques—are needed. The work provides a practical platform and dataset for measuring OOD robustness and highlights directions for developing models that remain reliable when deployed in the wild.

Abstract

Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This omission is concerning as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness in a positive linear way. The latter enables the prediction of OOD robustness from ID robustness. We then predict and verify that existing methods are unlikely to achieve high OOD robustness. Novel methods are therefore required to achieve OOD robustness beyond our prediction. To facilitate the development of these methods, we investigate a wide range of techniques and identify several promising directions. Code and models are available at: https://github.com/OODRobustBench/OODRobustBench.
Paper Structure (41 sections, 4 equations, 30 figures, 10 tables)

This paper contains 41 sections, 4 equations, 30 figures, 10 tables.

Figures (30)

  • Figure 1: The construction of OODRobustBench (top) and the correlation between ID and OOD robustness under 4 types of distribution shift for CIFAR10 $\ell_{\infty}$ (bottom). Each marker represents a model and is annotated by its training set-up. The solid blue line is the fitted linear correlation. The dashed gray line ($y=x$) represents perfect generalization where OOD robustness equals ID robustness. Deviation from the dashed line indicates robustness degradation under the respective distribution shift.
  • Figure 2: Degradation of accuracy and robustness under various distribution shifts for CIFAR10 $\ell_{\infty}$.
  • Figure 3: $R^2$ of regression between ID and OOD performance for Standardly-Trained (ST) and Adversarially-Trained (AT) models under various dataset shifts for CIFAR10 $\ell_{\infty}$. Higher $R^2$ implies stronger linear correlation. The results for ST models were copied from miller_accuracy_2021. Some results of ST are missing (blank cells) because they were not reported in miller_accuracy_2021.
  • Figure 4: $R^2$ of regression between ID seen robustness and OOD unforeseen robustness, i.e., threat shift.
  • Figure 5: Correlation between ID and OOD prediction agreement on adversarial examples for CIFAR10 $\ell_{\infty}$ AT models. Each point represents the prediction agreement of two models.
  • ...and 25 more figures