Table of Contents
Fetching ...

A Systematic Analysis of Out-of-Distribution Detection Under Representation and Training Paradigm Shifts

C. César Claros Olivares, Austin J. Brockmeier

TL;DR

The paper questions whether improved OOD detection hinges on complex scoring rules or on the geometry of learned representations. It introduces projection-filtering and CLIP-based proximity measures to compare CNN and ViT pipelines across diverse datasets, using $AURC$ and $AUGRC$ as robust metrics. A rank-based statistical framework reveals that probabilistic scores dominate under near-ID conditions, while geometry-aware and prototype-based cues gain strength under stronger distribution shifts, with GradNorm and KPCA RecError notably effective for ViTs; MCD offers a class-count dependent trade-off and PCA projections consistently boost several detectors. The findings advocate a representation-centric view of OOD detection and yield practical guidance for regime-specific method selection and detector design.

Abstract

We present a systematic comparison of out-of-distribution (OOD) detection methods across CLIP-stratified regimes using AURC and AUGRC as primary metrics. Experiments cover two representation paradigms: CNNs trained from scratch and a fine-tuned Vision Transformer (ViT), evaluated on CIFAR-10/100, SuperCIFAR-100, and TinyImageNet. Using a multiple-comparison-controlled, rank-based pipeline (Friedman test with Conover-Holm post-hoc) and Bron-Kerbosch cliques, we find that the learned feature space largely determines OOD efficacy. For both CNNs and ViTs, probabilistic scores (e.g., MSR, GEN) dominate misclassification (ID) detection. Under stronger shifts, geometry-aware scores (e.g., NNGuide, fDBD, CTM) prevail on CNNs, whereas on ViTs GradNorm and KPCA Reconstruction Error remain consistently competitive. We further show a class-count-dependent trade-off for Monte-Carlo Dropout (MCD) and that a simple PCA projection improves several detectors. These results support a representation-centric view of OOD detection and provide statistically grounded guidance for method selection under distribution shift.

A Systematic Analysis of Out-of-Distribution Detection Under Representation and Training Paradigm Shifts

TL;DR

The paper questions whether improved OOD detection hinges on complex scoring rules or on the geometry of learned representations. It introduces projection-filtering and CLIP-based proximity measures to compare CNN and ViT pipelines across diverse datasets, using and as robust metrics. A rank-based statistical framework reveals that probabilistic scores dominate under near-ID conditions, while geometry-aware and prototype-based cues gain strength under stronger distribution shifts, with GradNorm and KPCA RecError notably effective for ViTs; MCD offers a class-count dependent trade-off and PCA projections consistently boost several detectors. The findings advocate a representation-centric view of OOD detection and yield practical guidance for regime-specific method selection and detector design.

Abstract

We present a systematic comparison of out-of-distribution (OOD) detection methods across CLIP-stratified regimes using AURC and AUGRC as primary metrics. Experiments cover two representation paradigms: CNNs trained from scratch and a fine-tuned Vision Transformer (ViT), evaluated on CIFAR-10/100, SuperCIFAR-100, and TinyImageNet. Using a multiple-comparison-controlled, rank-based pipeline (Friedman test with Conover-Holm post-hoc) and Bron-Kerbosch cliques, we find that the learned feature space largely determines OOD efficacy. For both CNNs and ViTs, probabilistic scores (e.g., MSR, GEN) dominate misclassification (ID) detection. Under stronger shifts, geometry-aware scores (e.g., NNGuide, fDBD, CTM) prevail on CNNs, whereas on ViTs GradNorm and KPCA Reconstruction Error remain consistently competitive. We further show a class-count-dependent trade-off for Monte-Carlo Dropout (MCD) and that a simple PCA projection improves several detectors. These results support a representation-centric view of OOD detection and provide statistically grounded guidance for method selection under distribution shift.

Paper Structure

This paper contains 44 sections, 3 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Top-clique map (VGG-13 and AURC/AUGRC metrics): rows are CSF; columns are evaluation regimes labeled “source$\rightarrow$test, near, mid, far”. Within each column, connected dots indicate the Conover–Holm top clique ($\alpha$=0.05). Shaded bands emphasize methods that repeatedly appear in top cliques across regimes: probabilistic-derived CSF dominate ID regime, while prototype/geometry-aware methods (CTM-family, NNGuide, fDBD) dominate mid/far. Larger cliques imply many methods are statistically tied at the top; smaller cliques indicate sharper separation.
  • Figure 2: Top cliques for ViT (AUGRC/AURC). Columns are source→test, near, mid, far; rows are methods. Connected dots indicate the Conover–Holm top clique in each column; shaded bands mark coalitions that recur across regimes. ViT exhibits persistent top groups centered on GradNorm and KPCA RecError. Similar to the VGG-13 results. Misclassification regime sis dominated by probabilistic-derived CSFs.