Table of Contents
Fetching ...

Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection

Reihaneh Zohrabi, Hosein Hasani, Mahdieh Soleymani Baghshah, Anna Rohrbach, Marcus Rohrbach, Mohammad Hossein Rohban

TL;DR

SPROD tackles OOD detection under unknown spurious correlations by post-hoc refining class prototypes through three stages to capture subgroup structure and debias distances. It relies on distance-based scoring against multiple group prototypes, avoiding softmax confidence and retraining, to improve separation between ID and OOD samples. The method is validated on five SP-OOD benchmarks, including the new Animals MetaCoCo dataset, and shows consistent gains over 19 baselines across diverse backbones and settings, including NSP-OOD and conventional OOD tasks. These results demonstrate a scalable, data-efficient approach to robust OOD detection with real-world applicability. SPROD also provides theoretical and empirical insights into how subgroup-aware prototypes reduce spurious bias and enhance reliability in challenging SP-OOD scenarios.

Abstract

Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications, where they frequently face data distributions unseen during training. Despite progress, existing methods are often vulnerable to spurious correlations that mislead models and compromise robustness. To address this, we propose SPROD, a novel prototype-based OOD detection approach that explicitly addresses the challenge posed by unknown spurious correlations. Our post-hoc method refines class prototypes to mitigate bias from spurious features without additional data or hyperparameter tuning, and is broadly applicable across diverse backbones and OOD detection settings. We conduct a comprehensive spurious correlation OOD detection benchmarking, comparing our method against existing approaches and demonstrating its superior performance across challenging OOD datasets, such as CelebA, Waterbirds, UrbanCars, Spurious Imagenet, and the newly introduced Animals MetaCoCo. On average, SPROD improves AUROC by 4.8% and FPR@95 by 9.4% over the second best.

Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection

TL;DR

SPROD tackles OOD detection under unknown spurious correlations by post-hoc refining class prototypes through three stages to capture subgroup structure and debias distances. It relies on distance-based scoring against multiple group prototypes, avoiding softmax confidence and retraining, to improve separation between ID and OOD samples. The method is validated on five SP-OOD benchmarks, including the new Animals MetaCoCo dataset, and shows consistent gains over 19 baselines across diverse backbones and settings, including NSP-OOD and conventional OOD tasks. These results demonstrate a scalable, data-efficient approach to robust OOD detection with real-world applicability. SPROD also provides theoretical and empirical insights into how subgroup-aware prototypes reduce spurious bias and enhance reliability in challenging SP-OOD scenarios.

Abstract

Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications, where they frequently face data distributions unseen during training. Despite progress, existing methods are often vulnerable to spurious correlations that mislead models and compromise robustness. To address this, we propose SPROD, a novel prototype-based OOD detection approach that explicitly addresses the challenge posed by unknown spurious correlations. Our post-hoc method refines class prototypes to mitigate bias from spurious features without additional data or hyperparameter tuning, and is broadly applicable across diverse backbones and OOD detection settings. We conduct a comprehensive spurious correlation OOD detection benchmarking, comparing our method against existing approaches and demonstrating its superior performance across challenging OOD datasets, such as CelebA, Waterbirds, UrbanCars, Spurious Imagenet, and the newly introduced Animals MetaCoCo. On average, SPROD improves AUROC by 4.8% and FPR@95 by 9.4% over the second best.

Paper Structure

This paper contains 30 sections, 25 equations, 13 figures, 36 tables, 1 algorithm.

Figures (13)

  • Figure 1: The challenge of spurious correlations in OOD detection. ID classes (dog, fox, wolf) appear in correlated backgrounds (grass, autumn, snow), with majority groups relying on context shortcuts (blue frames). SP-OOD samples share the same contextual backgrounds, making detection more difficult. NSP-OOD samples differ in context and lack both spurious and core features.
  • Figure 2: (a) A far-OOD sample may receive a high softmax score, similar to a near-boundary ID sample. (b) Distances to class prototypes offer a more consistent separation of OOD samples. (c) In the SP-OOD setting, the problem is even more severe: A biased decision boundary causes the OOD sample to receive high softmax confidence, while a minority ID sample receives lower confidence.
  • Figure 3: Overview of the three main stages of SPROD. In the first stage, class prototypes are computed, though they may be biased due to spurious correlations. In the second stage, group prototypes are constructed for the misclassified and correctly classified samples of each class. Finally, in the third stage, class samples are reassigned to their nearest group prototypes, and based on these assignments, refined minority and majority prototypes are recalculated.
  • Figure 4: Effect of backbone fine-tuning and spurious correlation on SP-OOD detection using the Waterbirds dataset. Left: ResNet-50; right: ResNet-18. Each pair shows results under 50% (left) and 90% (right) spurious correlation in ID data. Fine-tuned models are marked with a hatch texture.
  • Figure 5: Comparison of generative and discriminative scoring for OOD detection using SPROD. (a) Histograms of ID and OOD sample scores using the distance-based generative approach and the softmax-based discriminative approach, both computed with SPROD on the Waterbirds dataset. (b) Performance comparison between the generative (distance-based) and discriminative (softmax-based) scoring variants of SPROD across the five SP-OOD benchmark datasets.
  • ...and 8 more figures