STOOD-X methodology: using statistical nonparametric test for OOD Detection Large-Scale datasets enhanced with explainability
Iván Sevillano-García, Julián Luengo, Francisco Herrera
TL;DR
STOOD-X tackles OOD detection by marrying a nonparametric, feature-space distance-based test with BLUE XAI explanations. Stage 1 uses a $k$-NN distance framework and the Wilcoxon-Mann-Whitney test to produce a statistically meaningful $p$-value-based OOD score without strong distributional assumptions, while Stage 2 provides concept- and neighbor-driven explanations to support human oversight. Through extensive OpenOOD-based experiments across CNN and transformer backbones, STOOD-X achieves competitive, and often superior, performance in high-dimensional settings and offers interpretable visualizations that reveal biases and guide debugging. The approach emphasizes trust, safety, and collaboration between humans and AI, with potential extensions to other modalities and interactive interfaces.
Abstract
Out-of-Distribution (OOD) detection is a critical task in machine learning, particularly in safety-sensitive applications where model failures can have serious consequences. However, current OOD detection methods often suffer from restrictive distributional assumptions, limited scalability, and a lack of interpretability. To address these challenges, we propose STOOD-X, a two-stage methodology that combines a Statistical nonparametric Test for OOD Detection with eXplainability enhancements. In the first stage, STOOD-X uses feature-space distances and a Wilcoxon-Mann-Whitney test to identify OOD samples without assuming a specific feature distribution. In the second stage, it generates user-friendly, concept-based visual explanations that reveal the features driving each decision, aligning with the BLUE XAI paradigm. Through extensive experiments on benchmark datasets and multiple architectures, STOOD-X achieves competitive performance against state-of-the-art post hoc OOD detectors, particularly in high-dimensional and complex settings. In addition, its explainability framework enables human oversight, bias detection, and model debugging, fostering trust and collaboration between humans and AI systems. The STOOD-X methodology therefore offers a robust, explainable, and scalable solution for real-world OOD detection tasks.
