A Median Perspective on Unlabeled Data for Out-of-Distribution Detection
Momin Abbas, Ali Falahati, Hossein Goli, Mohammad Mohammadi Amiri
TL;DR
Medix presents a median-centric, two-stage framework for out-of-distribution detection that leverages unlabeled in-the-wild data to identify candidate OOD outliers via a gradient-based, element-wise median filter. After extracting these candidates, Medix trains a dedicated OOD detector using InD data plus the filtered outliers, guided by a surrogate loss that preserves InD performance. Theoretical analysis provides two-sided error bounds under a sub-Gaussian gradient assumption and a relaxed non-sub-Gaussian bound, highlighting contamination, concentration, and separation effects that govern robustness. Empirically, Medix achieves superior OOD detection performance across CIFAR-10/100 with wild data, outperforming 20 baselines and even performing well in large-scale unseen OOD settings, while maintaining practical computational efficiency. These results demonstrate the practical viability of median-based filtering for robust open-world OOD detection with unlabeled data.
Abstract
Out-of-distribution (OOD) detection plays a crucial role in ensuring the robustness and reliability of machine learning systems deployed in real-world applications. Recent approaches have explored the use of unlabeled data, showing potential for enhancing OOD detection capabilities. However, effectively utilizing unlabeled in-the-wild data remains challenging due to the mixed nature of both in-distribution (InD) and OOD samples. The lack of a distinct set of OOD samples complicates the task of training an optimal OOD classifier. In this work, we introduce Medix, a novel framework designed to identify potential outliers from unlabeled data using the median operation. We use the median because it provides a stable estimate of the central tendency, as an OOD detection mechanism, due to its robustness against noise and outliers. Using these identified outliers, along with labeled InD data, we train a robust OOD classifier. From a theoretical perspective, we derive error bounds that demonstrate Medix achieves a low error rate. Empirical results further substantiate our claims, as Medix outperforms existing methods across the board in open-world settings, confirming the validity of our theoretical insights.
