Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors
Peter Lorenz, Mario Fernandez, Jens Müller, Ullrich Köthe
TL;DR
This work targets adversarial robustness of post-hoc OOD detectors, arguing that current benchmarks neglect adversarial examples (AdEx) and thus overestimate real-world reliability. It extends the OpenOOD framework with evasive attacks and a Grad-CAM–based metric to quantify semantic shifts, evaluating 16 detectors across CIFAR and ImageNet scales. The paper revises definitions to incorporate attention maps and proposes a multi-level roadmap toward adversarial defense, emphasizing Level 1 (AdEx on a unified dataset) as a baseline and Level 5 adaptive-threat defenses. Empirically, it finds that state-of-the-art post-hoc detectors provide limited AdEx robustness, underscoring the need for standardized baselines and more robust, defense-oriented designs in open-world detection.
Abstract
Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in real-world scenarios. In recent years, many OOD detectors have been developed, and even the benchmarking has been standardized, i.e. OpenOOD. The number of post-hoc detectors is growing fast. They are showing an option to protect a pre-trained classifier against natural distribution shifts and claim to be ready for real-world scenarios. However, its effectiveness in dealing with adversarial examples (AdEx) has been neglected in most studies. In cases where an OOD detector includes AdEx in its experiments, the lack of uniform parameters for AdEx makes it difficult to accurately evaluate the performance of the OOD detector. This paper investigates the adversarial robustness of 16 post-hoc detectors against various evasion attacks. It also discusses a roadmap for adversarial defense in OOD detectors that would help adversarial robustness. We believe that level 1 (AdEx on a unified dataset) should be added to any OOD detector to see the limitations. The last level in the roadmap (defense against adaptive attacks) we added for integrity from an adversarial machine learning (AML) point of view, which we do not believe is the ultimate goal for OOD detectors.
