Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection
I. M. De la Jara, C. Rodriguez-Opazo, D. Teney, D. Ranasinghe, E. Abbasnejad
TL;DR
This work interrogates the default reliance on final-layer representations for out-of-distribution detection and shows that intermediate-layer signals in pretrained vision–language models carry complementary, disorder-sensitive information. It extends the Maximum Concept Matching (MCM) framework across multiple layers and introduces an entropy-based, training-free layer selection to automatically fuse informative layers without OOD data. Empirical results across diverse backbones (including CLIP) and datasets demonstrate consistent gains in both far- and near-OOD regimes, with notable improvements in CLIP-like architectures and for near-OOD detection. The findings suggest a practical, architecture-aware path to more robust OOD detection that leverages internal model structure with only modest computational overhead. This approach has potential implications for real-world safety-critical AI systems and motivates future work on adaptive fusion policies and multi-modal OOD strategies.
Abstract
Out-of-distribution (OOD) detection is essential for reliably deploying machine learning models in the wild. Yet, most methods treat large pre-trained models as monolithic encoders and rely solely on their final-layer representations for detection. We challenge this wisdom. We reveal the \textit{intermediate layers} of pre-trained models, shaped by residual connections that subtly transform input projections, \textit{can} encode \textit{surprisingly rich and diverse signals} for detecting distributional shifts. Importantly, to exploit latent representation diversity across layers, we introduce an entropy-based criterion to \textit{automatically} identify layers offering the most complementary information in a training-free setting -- \textit{without access to OOD data}. We show that selectively incorporating these intermediate representations can increase the accuracy of OOD detection by up to \textbf{$10\%$} in far-OOD and over \textbf{$7\%$} in near-OOD benchmarks compared to state-of-the-art training-free methods across various model architectures and training objectives. Our findings reveal a new avenue for OOD detection research and uncover the impact of various training objectives and model architectures on confidence-based OOD detection methods.
