Taming False Positives in Out-of-Distribution Detection with Human Feedback
Harit Vishwakarma, Heguang Lin, Ramya Korlakai Vinayak
TL;DR
This work tackles safe OOD detection in open-world settings by enforcing a user-specified false positive rate $α$ while maximizing true positive rate, through a human-in-the-loop framework that updates the threshold online using expert feedback. It is agnostic to the specific OOD scoring function and relies on an unbiased estimator of $FPR$ plus a time-uniform upper confidence bound $ψ(t,δ)$ derived from the Law of Iterated Logarithm to guarantee $FPR(\hat{λ}_t) ≤ α$ for all times, with theoretical bounds on time to feasibility $T_f$ and $η$-optimality $T_{η-opt}$ in stationary settings. The approach extends to distribution shifts via a sliding-window adaptation and restart strategies, showing that $FPR$ can be kept near $α$ while achieving high $TPR$ across synthetic and real OOD benchmarks and multiple scoring functions. Overall, the framework offers practical, anytime-valid guarantees for safe OOD deployment in dynamic environments and reduces the reliance on pre-collected OOD data.
Abstract
Robustness to out-of-distribution (OOD) samples is crucial for safely deploying machine learning models in the open world. Recent works have focused on designing scoring functions to quantify OOD uncertainty. Setting appropriate thresholds for these scoring functions for OOD detection is challenging as OOD samples are often unavailable up front. Typically, thresholds are set to achieve a desired true positive rate (TPR), e.g., $95\%$ TPR. However, this can lead to very high false positive rates (FPR), ranging from 60 to 96\%, as observed in the Open-OOD benchmark. In safety-critical real-life applications, e.g., medical diagnosis, controlling the FPR is essential when dealing with various OOD samples dynamically. To address these challenges, we propose a mathematically grounded OOD detection framework that leverages expert feedback to \emph{safely} update the threshold on the fly. We provide theoretical results showing that it is guaranteed to meet the FPR constraint at all times while minimizing the use of human feedback. Another key feature of our framework is that it can work with any scoring function for OOD uncertainty quantification. Empirical evaluation of our system on synthetic and benchmark OOD datasets shows that our method can maintain FPR at most $5\%$ while maximizing TPR.
