ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms
William Yang, Byron Zhang, Olga Russakovsky
TL;DR
This work introduces ImageNet-OOD to isolate semantic out-of-distribution (OOD) detection from covariate shift, revealing that modern detectors are disproportionately influenced by covariate shifts and often offer minimal gains over the MSP baseline when covariate interference is controlled. By carefully constructing ImageNet-OOD from ImageNet-21K and removing contamination from ImageNet-1K, the authors show that many previously reported improvements on semantic shift benchmarks do not translate to real-world OOD detection under covariate shift. Across nine detectors and thirteen architectures, the study demonstrates that detector performance is more sensitive to covariate shifts (e.g., ImageNet-R) than to pure semantic shift (ImageNet-OOD), and that a sanity check with random models confirms covariance-driven biases. The findings challenge the practical utility of current OOD detectors for semantic shift and stress the need for methods that robustly differentiate semantic shifts from covariate-driven cues, with implications for safer, more reliable deployment of vision systems.
Abstract
The task of out-of-distribution (OOD) detection is notoriously ill-defined. Earlier works focused on new-class detection, aiming to identify label-altering data distribution shifts, also known as "semantic shift." However, recent works argue for a focus on failure detection, expanding the OOD evaluation framework to account for label-preserving data distribution shifts, also known as "covariate shift." Intriguingly, under this new framework, complex OOD detectors that were previously considered state-of-the-art now perform similarly to, or even worse than the simple maximum softmax probability baseline. This raises the question: what are the latest OOD detectors actually detecting? Deciphering the behavior of OOD detection algorithms requires evaluation datasets that decouples semantic shift and covariate shift. To aid our investigations, we present ImageNet-OOD, a clean semantic shift dataset that minimizes the interference of covariate shift. Through comprehensive experiments, we show that OOD detectors are more sensitive to covariate shift than to semantic shift, and the benefits of recent OOD detection algorithms on semantic shift detection is minimal. Our dataset and analyses provide important insights for guiding the design of future OOD detectors.
