A Systematic Analysis of Out-of-Distribution Detection Under Representation and Training Paradigm Shifts
C. César Claros Olivares, Austin J. Brockmeier
TL;DR
The paper questions whether improved OOD detection hinges on complex scoring rules or on the geometry of learned representations. It introduces projection-filtering and CLIP-based proximity measures to compare CNN and ViT pipelines across diverse datasets, using $AURC$ and $AUGRC$ as robust metrics. A rank-based statistical framework reveals that probabilistic scores dominate under near-ID conditions, while geometry-aware and prototype-based cues gain strength under stronger distribution shifts, with GradNorm and KPCA RecError notably effective for ViTs; MCD offers a class-count dependent trade-off and PCA projections consistently boost several detectors. The findings advocate a representation-centric view of OOD detection and yield practical guidance for regime-specific method selection and detector design.
Abstract
We present a systematic comparison of out-of-distribution (OOD) detection methods across CLIP-stratified regimes using AURC and AUGRC as primary metrics. Experiments cover two representation paradigms: CNNs trained from scratch and a fine-tuned Vision Transformer (ViT), evaluated on CIFAR-10/100, SuperCIFAR-100, and TinyImageNet. Using a multiple-comparison-controlled, rank-based pipeline (Friedman test with Conover-Holm post-hoc) and Bron-Kerbosch cliques, we find that the learned feature space largely determines OOD efficacy. For both CNNs and ViTs, probabilistic scores (e.g., MSR, GEN) dominate misclassification (ID) detection. Under stronger shifts, geometry-aware scores (e.g., NNGuide, fDBD, CTM) prevail on CNNs, whereas on ViTs GradNorm and KPCA Reconstruction Error remain consistently competitive. We further show a class-count-dependent trade-off for Monte-Carlo Dropout (MCD) and that a simple PCA projection improves several detectors. These results support a representation-centric view of OOD detection and provide statistically grounded guidance for method selection under distribution shift.
