Tuning Out-of-Distribution (OOD) Detectors Without Given OOD Data

Sudeepta Mondal; Xinyi Mary Xie; Ruxiao Duan; Alex Wong; Ganesh Sundaramoorthi

Tuning Out-of-Distribution (OOD) Detectors Without Given OOD Data

Sudeepta Mondal, Xinyi Mary Xie, Ruxiao Duan, Alex Wong, Ganesh Sundaramoorthi

TL;DR

The paper tackles the practical challenge of tuning OOD detectors without access to a predefined OOD dataset, addressing the variance caused by adhoc tuning data. It introduces a self-contained method that simulates ID/OOD boundaries directly from in-distribution training data by withholding random categories to create $N$ network variants and optimizes detector parameters via an aggregated loss $\ell(\phi|M)$. The approach, grounded in Bayesian optimization, shows consistent gains for higher-parameter detectors (e.g., $VRA$, $PLF$) across CIFAR and ImageNet benchmarks, while remaining competitive for lower-parameter methods and sometimes outperforming tuning with real OOD data on OpenOOD benchmarks. This work enables more reliable OOD detection in real-world deployments where explicit OOD data may be unavailable, and suggests further exploration for scalable tuning in more complex tasks.

Abstract

Existing out-of-distribution (OOD) detectors are often tuned by a separate dataset deemed OOD with respect to the training distribution of a neural network (NN). OOD detectors process the activations of NN layers and score the output, where parameters of the detectors are determined by fitting to an in-distribution (training) set and the aforementioned dataset chosen adhocly. At detector training time, this adhoc dataset may not be available or difficult to obtain, and even when it's available, it may not be representative of actual OOD data, which is often ''unknown unknowns." Current benchmarks may specify some left-out set from test OOD sets. We show that there can be significant variance in performance of detectors based on the adhoc dataset chosen in current literature, and thus even if such a dataset can be collected, the performance of the detector may be highly dependent on the choice. In this paper, we introduce and formalize the often neglected problem of tuning OOD detectors without a given ``OOD'' dataset. To this end, we present strong baselines as an attempt to approach this problem. Furthermore, we propose a new generic approach to OOD detector tuning that does not require any extra data other than those used to train the NN. We show that our approach improves over baseline methods consistently across higher-parameter OOD detector families, while being comparable across lower-parameter families.

Tuning Out-of-Distribution (OOD) Detectors Without Given OOD Data

TL;DR

network variants and optimizes detector parameters via an aggregated loss

. The approach, grounded in Bayesian optimization, shows consistent gains for higher-parameter detectors (e.g.,

) across CIFAR and ImageNet benchmarks, while remaining competitive for lower-parameter methods and sometimes outperforming tuning with real OOD data on OpenOOD benchmarks. This work enables more reliable OOD detection in real-world deployments where explicit OOD data may be unavailable, and suggests further exploration for scalable tuning in more complex tasks.

Abstract

Paper Structure (19 sections, 14 equations, 1 figure, 5 tables, 4 algorithms)

This paper contains 19 sections, 14 equations, 1 figure, 5 tables, 4 algorithms.

Introduction
Contributions.
Related Work
Our Method
Baseline OOD Detector Tuning Methods
Experiments
Datasets and Settings
OOD Detectors Trained
Results
Ablation on number of simulated OOD categories
Limitations
Conclusion
Detector Definitions and Parameter Selection
ReAct.
ASH.
...and 4 more sections

Figures (1)

Figure 1: Performance of OOD detectors varies with tuning set. Several state-of-the-art OOD detectors parameters are tuned using different predefined tuning datasets deemed OOD in the OpenOOD benchmark. The mean and standard deviation of the detectors' FPR95 (y-axis) over tuning sets for each test OOD datasets (x-axis) is shown. Depending on the tuning dataset selected, there can be significant variability in performance of the same detectors (vertical bars), so much so that the rankings of detectors can change. Results on two different networks (DenseNet101 - left and ResNet-18 - right) are shown. Our approach avoids the need for given OOD data, which may be difficult to obtain in practice, and variance associated with the choice, and still has comparable performance to given test OOD sets as OOD tuning data in state-of-the-art benchmarks (see Table \ref{['tab:table_val_set_tuning_comparison']}).

Tuning Out-of-Distribution (OOD) Detectors Without Given OOD Data

TL;DR

Abstract

Tuning Out-of-Distribution (OOD) Detectors Without Given OOD Data

Authors

TL;DR

Abstract

Table of Contents

Figures (1)