Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors

Peter Lorenz; Mario Fernandez; Jens Müller; Ullrich Köthe

Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors

Peter Lorenz, Mario Fernandez, Jens Müller, Ullrich Köthe

TL;DR

This work targets adversarial robustness of post-hoc OOD detectors, arguing that current benchmarks neglect adversarial examples (AdEx) and thus overestimate real-world reliability. It extends the OpenOOD framework with evasive attacks and a Grad-CAM–based metric to quantify semantic shifts, evaluating 16 detectors across CIFAR and ImageNet scales. The paper revises definitions to incorporate attention maps and proposes a multi-level roadmap toward adversarial defense, emphasizing Level 1 (AdEx on a unified dataset) as a baseline and Level 5 adaptive-threat defenses. Empirically, it finds that state-of-the-art post-hoc detectors provide limited AdEx robustness, underscoring the need for standardized baselines and more robust, defense-oriented designs in open-world detection.

Abstract

Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in real-world scenarios. In recent years, many OOD detectors have been developed, and even the benchmarking has been standardized, i.e. OpenOOD. The number of post-hoc detectors is growing fast. They are showing an option to protect a pre-trained classifier against natural distribution shifts and claim to be ready for real-world scenarios. However, its effectiveness in dealing with adversarial examples (AdEx) has been neglected in most studies. In cases where an OOD detector includes AdEx in its experiments, the lack of uniform parameters for AdEx makes it difficult to accurately evaluate the performance of the OOD detector. This paper investigates the adversarial robustness of 16 post-hoc detectors against various evasion attacks. It also discusses a roadmap for adversarial defense in OOD detectors that would help adversarial robustness. We believe that level 1 (AdEx on a unified dataset) should be added to any OOD detector to see the limitations. The last level in the roadmap (defense against adaptive attacks) we added for integrity from an adversarial machine learning (AML) point of view, which we do not believe is the ultimate goal for OOD detectors.

Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors

TL;DR

Abstract

Paper Structure (16 sections, 1 equation, 1 figure, 4 tables)

This paper contains 16 sections, 1 equation, 1 figure, 4 tables.

INTRODUCTION
RELATED WORK
Evasion Attacks Crafting Inliers
Advantages of Post-Hoc OOD Detectors
OOD Adversarial Detection
CAM-based Explanations and Consistent Shift by AdEx
DEFINITIONS IN OOD DETECTION
Existing Definitions
Extension to Adversarial Robust Definition
EXPERIMENTS
Experiment Setup
Grad-CAM Similarity
DISCUSSION
Levels of Adversarial Robustness - From Detector towards Defense
CONCLUSION
...and 1 more sections

Figures (1)

Figure 1: Grad-CAM comparison between the benign and its attacked counterpart. The color red indicates a high intensity of samples on similar heatmaps. The color blue indicates a low intensity. The attacks can be compared row-wise that is sorted according to datasets and attacked DNN. The DF attack yields very similar heatmaps across all datasets and models.

Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors

TL;DR

Abstract

Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors

Authors

TL;DR

Abstract

Table of Contents

Figures (1)