Taming False Positives in Out-of-Distribution Detection with Human Feedback

Harit Vishwakarma; Heguang Lin; Ramya Korlakai Vinayak

Taming False Positives in Out-of-Distribution Detection with Human Feedback

Harit Vishwakarma, Heguang Lin, Ramya Korlakai Vinayak

TL;DR

This work tackles safe OOD detection in open-world settings by enforcing a user-specified false positive rate $α$ while maximizing true positive rate, through a human-in-the-loop framework that updates the threshold online using expert feedback. It is agnostic to the specific OOD scoring function and relies on an unbiased estimator of $FPR$ plus a time-uniform upper confidence bound $ψ(t,δ)$ derived from the Law of Iterated Logarithm to guarantee $FPR(\hat{λ}_t) ≤ α$ for all times, with theoretical bounds on time to feasibility $T_f$ and $η$-optimality $T_{η-opt}$ in stationary settings. The approach extends to distribution shifts via a sliding-window adaptation and restart strategies, showing that $FPR$ can be kept near $α$ while achieving high $TPR$ across synthetic and real OOD benchmarks and multiple scoring functions. Overall, the framework offers practical, anytime-valid guarantees for safe OOD deployment in dynamic environments and reduces the reliance on pre-collected OOD data.

Abstract

Robustness to out-of-distribution (OOD) samples is crucial for safely deploying machine learning models in the open world. Recent works have focused on designing scoring functions to quantify OOD uncertainty. Setting appropriate thresholds for these scoring functions for OOD detection is challenging as OOD samples are often unavailable up front. Typically, thresholds are set to achieve a desired true positive rate (TPR), e.g., $95\%$ TPR. However, this can lead to very high false positive rates (FPR), ranging from 60 to 96\%, as observed in the Open-OOD benchmark. In safety-critical real-life applications, e.g., medical diagnosis, controlling the FPR is essential when dealing with various OOD samples dynamically. To address these challenges, we propose a mathematically grounded OOD detection framework that leverages expert feedback to \emph{safely} update the threshold on the fly. We provide theoretical results showing that it is guaranteed to meet the FPR constraint at all times while minimizing the use of human feedback. Another key feature of our framework is that it can work with any scoring function for OOD uncertainty quantification. Empirical evaluation of our system on synthetic and benchmark OOD datasets shows that our method can maintain FPR at most $5\%$ while maximizing TPR.

Taming False Positives in Out-of-Distribution Detection with Human Feedback

TL;DR

This work tackles safe OOD detection in open-world settings by enforcing a user-specified false positive rate

while maximizing true positive rate, through a human-in-the-loop framework that updates the threshold online using expert feedback. It is agnostic to the specific OOD scoring function and relies on an unbiased estimator of

plus a time-uniform upper confidence bound

derived from the Law of Iterated Logarithm to guarantee

for all times, with theoretical bounds on time to feasibility

and

-optimality

in stationary settings. The approach extends to distribution shifts via a sliding-window adaptation and restart strategies, showing that

can be kept near

while achieving high

across synthetic and real OOD benchmarks and multiple scoring functions. Overall, the framework offers practical, anytime-valid guarantees for safe OOD deployment in dynamic environments and reduces the reliance on pre-collected OOD data.

Abstract

TPR. However, this can lead to very high false positive rates (FPR), ranging from 60 to 96\%, as observed in the Open-OOD benchmark. In safety-critical real-life applications, e.g., medical diagnosis, controlling the FPR is essential when dealing with various OOD samples dynamically. To address these challenges, we propose a mathematically grounded OOD detection framework that leverages expert feedback to \emph{safely} update the threshold on the fly. We provide theoretical results showing that it is guaranteed to meet the FPR constraint at all times while minimizing the use of human feedback. Another key feature of our framework is that it can work with any scoring function for OOD uncertainty quantification. Empirical evaluation of our system on synthetic and benchmark OOD datasets shows that our method can maintain FPR at most

while maximizing TPR.

Paper Structure (19 sections, 8 theorems, 38 equations, 34 figures, 2 tables, 2 algorithms)

This paper contains 19 sections, 8 theorems, 38 equations, 34 figures, 2 tables, 2 algorithms.

Introduction
Human-in-the-Loop OOD Detection
Problem Setting
Adaptive Threshold Estimation
Theoretical Guarantees
Empirical Evaluation
Stationary Distributions Setting
Distribution Shift Setting
Related Works
Conclusion
Acknowledgments
Appendix
Glossary
Proofs
Additional Details of the Algorithm
...and 4 more sections

Key Result

Lemma 1

Let $p>0$, $\widehat{\text{FPR}}(\lambda,t)$ as defined in eq. eqn:EmpFPR is an unbiased estimate of the true $\text{FPR}(\lambda)$, i.e., $\mathbb{E}[\widehat{\text{FPR}}(\lambda,t)] = \text{FPR}(\lambda)$.

Figures (34)

Figure 1: Illustration of OOD detection with human-in-the-loop with FPR control. In this example, the ID data is of brain scans of normal people and those with Alzheimer's disease. The OOD data could be anything other than these, e.g. brain scans of patients with some other diseases.
Figure 2: $\text{FPR}\left(\lambda^{\star}\right)=\alpha \longleftrightarrow \text{CDF}_{\mathcal{D}_{\text{out }}}\left(\lambda^{\star}\right)=1-\alpha$. Optimal $\lambda^\star$ for the optimization problem \ref{['eqn:ideal-opt']} with $\alpha = 0.05$ and $x_t \overset{i.i.d}{\sim} 0.7\ \mathcal{D}_{in} + 0.3\ \mathcal{D}_{out}$, where $\mathcal{D}_{in}$ is $\mathcal{N}(4, 1)$ and $\mathcal{D}_{out}$ is $\mathcal{N}(0, 1)$.
Figure 3: Illustration of the confidence interval defined in eq. \ref{['eqn:psi-theory']} on FPR and their effect on threshold estimation. As the system receives more OOD samples the confidence intervals will shrink and lead to better thresholds safely ($\hat{\lambda}_t \geq \lambda^\star$).
Figure 4: Results on the synthetic data with stationary distributions, $\gamma=0.2$, and using no window. Each method is repeated 10 times. The mean and standard deviation are shown.
Figure 5: Effect of using various window sizes in synthetic data experiments. The distribution shift starts at $t=50k$. The arrow indicates the time at which the mean FPR + std. deviation over 10 runs goes below 5% for the LIL method.
...and 29 more figures

Theorems & Definitions (15)

Lemma 1
Definition 1
Theorem 1
Lemma 2
proof
Lemma 3
proof
Lemma 4
proof
Lemma 5
...and 5 more

Taming False Positives in Out-of-Distribution Detection with Human Feedback

TL;DR

Abstract

Taming False Positives in Out-of-Distribution Detection with Human Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (34)

Theorems & Definitions (15)