OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection
Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Yixuan Li, Ziwei Liu, Yiran Chen, Hai Li
TL;DR
OpenOOD v1.5 addresses the need for scalable, standardized evaluation in Out-of-Distribution (OOD) detection by extending to large-scale datasets (ImageNet-1K/200) and foundation models (e.g., CLIP, DINOv2) and by introducing full-spectrum OOD detection that couples semantic and covariate shifts. It formalizes OOD via density-based definitions and open space risk, and it defines a rigorous evaluation protocol with near- and far-OOD splits, dedicated validation sets, and disjoint OOD training/test data. The paper provides four standard and two full-spectrum benchmarks, analyzes 40 methods across diverse architectures, and delivers actionable insights such as the broad benefit of data augmentations and the nuanced effects of model architecture and training vs post-hoc approaches. Its findings highlight that no single detector dominates across all settings, that full-spectrum detection remains a challenging open problem, and that foundation models show promise but require detector alignment; collectively, OpenOOD v1.5 supplies a robust, scalable benchmark to accelerate progress in OOD detection. $R_O(f)=\frac{\iint f(x)p_{\mathcal{D}_{OOD}}(x,y)\,dx\,dy}{\iint f(x)p_{\mathcal{D}_{OOD}}(x,y)\,dx\,dy+\iint f(x)p_{\mathcal{D}_{ID}}(x,y)\,dx\,dy}$ expresses the open space risk minimized by OOD detectors.
Abstract
Out-of-Distribution (OOD) detection is critical for the reliable operation of open-world intelligent systems. Despite the emergence of an increasing number of OOD detection methods, the evaluation inconsistencies present challenges for tracking the progress in this field. OpenOOD v1 initiated the unification of the OOD detection evaluation but faced limitations in scalability and scope. In response, this paper presents OpenOOD v1.5, a significant improvement from its predecessor that ensures accurate and standardized evaluation of OOD detection methodologies at large scale. Notably, OpenOOD v1.5 extends its evaluation capabilities to large-scale data sets (ImageNet) and foundation models (e.g., CLIP and DINOv2), and expands its scope to investigate full-spectrum OOD detection which considers semantic and covariate distribution shifts at the same time. This work also contributes in-depth analysis and insights derived from comprehensive experimental results, thereby enriching the knowledge pool of OOD detection methodologies. With these enhancements, OpenOOD v1.5 aims to drive advancements and offer a more robust and comprehensive evaluation benchmark for OOD detection research.
