Table of Contents
Fetching ...

STOOD-X methodology: using statistical nonparametric test for OOD Detection Large-Scale datasets enhanced with explainability

Iván Sevillano-García, Julián Luengo, Francisco Herrera

TL;DR

STOOD-X tackles OOD detection by marrying a nonparametric, feature-space distance-based test with BLUE XAI explanations. Stage 1 uses a $k$-NN distance framework and the Wilcoxon-Mann-Whitney test to produce a statistically meaningful $p$-value-based OOD score without strong distributional assumptions, while Stage 2 provides concept- and neighbor-driven explanations to support human oversight. Through extensive OpenOOD-based experiments across CNN and transformer backbones, STOOD-X achieves competitive, and often superior, performance in high-dimensional settings and offers interpretable visualizations that reveal biases and guide debugging. The approach emphasizes trust, safety, and collaboration between humans and AI, with potential extensions to other modalities and interactive interfaces.

Abstract

Out-of-Distribution (OOD) detection is a critical task in machine learning, particularly in safety-sensitive applications where model failures can have serious consequences. However, current OOD detection methods often suffer from restrictive distributional assumptions, limited scalability, and a lack of interpretability. To address these challenges, we propose STOOD-X, a two-stage methodology that combines a Statistical nonparametric Test for OOD Detection with eXplainability enhancements. In the first stage, STOOD-X uses feature-space distances and a Wilcoxon-Mann-Whitney test to identify OOD samples without assuming a specific feature distribution. In the second stage, it generates user-friendly, concept-based visual explanations that reveal the features driving each decision, aligning with the BLUE XAI paradigm. Through extensive experiments on benchmark datasets and multiple architectures, STOOD-X achieves competitive performance against state-of-the-art post hoc OOD detectors, particularly in high-dimensional and complex settings. In addition, its explainability framework enables human oversight, bias detection, and model debugging, fostering trust and collaboration between humans and AI systems. The STOOD-X methodology therefore offers a robust, explainable, and scalable solution for real-world OOD detection tasks.

STOOD-X methodology: using statistical nonparametric test for OOD Detection Large-Scale datasets enhanced with explainability

TL;DR

STOOD-X tackles OOD detection by marrying a nonparametric, feature-space distance-based test with BLUE XAI explanations. Stage 1 uses a -NN distance framework and the Wilcoxon-Mann-Whitney test to produce a statistically meaningful -value-based OOD score without strong distributional assumptions, while Stage 2 provides concept- and neighbor-driven explanations to support human oversight. Through extensive OpenOOD-based experiments across CNN and transformer backbones, STOOD-X achieves competitive, and often superior, performance in high-dimensional settings and offers interpretable visualizations that reveal biases and guide debugging. The approach emphasizes trust, safety, and collaboration between humans and AI, with potential extensions to other modalities and interactive interfaces.

Abstract

Out-of-Distribution (OOD) detection is a critical task in machine learning, particularly in safety-sensitive applications where model failures can have serious consequences. However, current OOD detection methods often suffer from restrictive distributional assumptions, limited scalability, and a lack of interpretability. To address these challenges, we propose STOOD-X, a two-stage methodology that combines a Statistical nonparametric Test for OOD Detection with eXplainability enhancements. In the first stage, STOOD-X uses feature-space distances and a Wilcoxon-Mann-Whitney test to identify OOD samples without assuming a specific feature distribution. In the second stage, it generates user-friendly, concept-based visual explanations that reveal the features driving each decision, aligning with the BLUE XAI paradigm. Through extensive experiments on benchmark datasets and multiple architectures, STOOD-X achieves competitive performance against state-of-the-art post hoc OOD detectors, particularly in high-dimensional and complex settings. In addition, its explainability framework enables human oversight, bias detection, and model debugging, fostering trust and collaboration between humans and AI systems. The STOOD-X methodology therefore offers a robust, explainable, and scalable solution for real-world OOD detection tasks.

Paper Structure

This paper contains 20 sections, 1 equation, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Diagrammatic representation of the ASH methodology for estimating OOD confidence via Energy scores of features simplification.
  • Figure 2: Flowchart of the STOOD-X methodology
  • Figure 3: Representation of the machine learning models into two separated functions $V$ and $C$
  • Figure 4: Intuition of the behavior in the feature space of ID (orange and blue) and OOD (green) samples
  • Figure 5: Feature samples and distances distribution of $(x,sin(x)+\mathcal{N}(0,0.2)$ (blue) and an OOD sample(orange). In gray, the connection on OOD sample and its NNs. In black, the connections of the OOD's NNs and their NNs.
  • ...and 2 more figures