Tuning Out-of-Distribution (OOD) Detectors Without Given OOD Data
Sudeepta Mondal, Xinyi Mary Xie, Ruxiao Duan, Alex Wong, Ganesh Sundaramoorthi
TL;DR
The paper tackles the practical challenge of tuning OOD detectors without access to a predefined OOD dataset, addressing the variance caused by adhoc tuning data. It introduces a self-contained method that simulates ID/OOD boundaries directly from in-distribution training data by withholding random categories to create $N$ network variants and optimizes detector parameters via an aggregated loss $\ell(\phi|M)$. The approach, grounded in Bayesian optimization, shows consistent gains for higher-parameter detectors (e.g., $VRA$, $PLF$) across CIFAR and ImageNet benchmarks, while remaining competitive for lower-parameter methods and sometimes outperforming tuning with real OOD data on OpenOOD benchmarks. This work enables more reliable OOD detection in real-world deployments where explicit OOD data may be unavailable, and suggests further exploration for scalable tuning in more complex tasks.
Abstract
Existing out-of-distribution (OOD) detectors are often tuned by a separate dataset deemed OOD with respect to the training distribution of a neural network (NN). OOD detectors process the activations of NN layers and score the output, where parameters of the detectors are determined by fitting to an in-distribution (training) set and the aforementioned dataset chosen adhocly. At detector training time, this adhoc dataset may not be available or difficult to obtain, and even when it's available, it may not be representative of actual OOD data, which is often ''unknown unknowns." Current benchmarks may specify some left-out set from test OOD sets. We show that there can be significant variance in performance of detectors based on the adhoc dataset chosen in current literature, and thus even if such a dataset can be collected, the performance of the detector may be highly dependent on the choice. In this paper, we introduce and formalize the often neglected problem of tuning OOD detectors without a given ``OOD'' dataset. To this end, we present strong baselines as an attempt to approach this problem. Furthermore, we propose a new generic approach to OOD detector tuning that does not require any extra data other than those used to train the NN. We show that our approach improves over baseline methods consistently across higher-parameter OOD detector families, while being comparable across lower-parameter families.
