Iterative Deployment Exposure for Unsupervised Out-of-Distribution Detection
Lars Doorenbos, Raphael Sznitman, Pablo Márquez-Neila
TL;DR
This paper addresses the instability of deep learning models under out-of-distribution (OOD) conditions in medical imaging by introducing Iterative Deployment Exposure (IDE), a deployment-aware setting that updates unsupervised OOD detectors with unlabeled deployment data over time. It proposes CSO, a two-branch detector that gradually shifts from a few-shot learner to a strong binary classifier, using a novel MkNN-based U-OOD score and a confidence-scaled, few-shot OOD learner to learn from limited OOD examples. The method defines a contamination model for deployment data, employs bootstrapped uncertainty to calibrate learning between the two branches, and evaluates on three medical-imaging benchmarks, showing CSO outperforms strong baselines in time-evolving OOD detection. The work provides new IDE benchmarks, introduces time-aware evaluation metrics, and demonstrates practical significance for safer, deployment-time OOD handling in medical imaging.
Abstract
Deep learning models are vulnerable to performance degradation when encountering out-of-distribution (OOD) images, potentially leading to misdiagnoses and compromised patient care. These shortcomings have led to great interest in the field of OOD detection. Existing unsupervised OOD (U-OOD) detection methods typically assume that OOD samples originate from an unconcentrated distribution complementary to the training distribution, neglecting the reality that deployed models passively accumulate task-specific OOD samples over time. To better reflect this real-world scenario, we introduce Iterative Deployment Exposure (IDE), a novel and more realistic setting for U-OOD detection. We propose CSO, a method for IDE that starts from a U-OOD detector that is agnostic to the OOD distribution and slowly refines it during deployment using observed unlabeled data. CSO uses a new U-OOD scoring function that combines the Mahalanobis distance with a nearest-neighbor approach, along with a novel confidence-scaled few-shot OOD detector to effectively learn from limited OOD examples. We validate our approach on a dedicated benchmark, showing that our method greatly improves upon strong baselines on three medical imaging modalities.
