Table of Contents
Fetching ...

Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection

I. M. De la Jara, C. Rodriguez-Opazo, D. Teney, D. Ranasinghe, E. Abbasnejad

TL;DR

This work interrogates the default reliance on final-layer representations for out-of-distribution detection and shows that intermediate-layer signals in pretrained vision–language models carry complementary, disorder-sensitive information. It extends the Maximum Concept Matching (MCM) framework across multiple layers and introduces an entropy-based, training-free layer selection to automatically fuse informative layers without OOD data. Empirical results across diverse backbones (including CLIP) and datasets demonstrate consistent gains in both far- and near-OOD regimes, with notable improvements in CLIP-like architectures and for near-OOD detection. The findings suggest a practical, architecture-aware path to more robust OOD detection that leverages internal model structure with only modest computational overhead. This approach has potential implications for real-world safety-critical AI systems and motivates future work on adaptive fusion policies and multi-modal OOD strategies.

Abstract

Out-of-distribution (OOD) detection is essential for reliably deploying machine learning models in the wild. Yet, most methods treat large pre-trained models as monolithic encoders and rely solely on their final-layer representations for detection. We challenge this wisdom. We reveal the \textit{intermediate layers} of pre-trained models, shaped by residual connections that subtly transform input projections, \textit{can} encode \textit{surprisingly rich and diverse signals} for detecting distributional shifts. Importantly, to exploit latent representation diversity across layers, we introduce an entropy-based criterion to \textit{automatically} identify layers offering the most complementary information in a training-free setting -- \textit{without access to OOD data}. We show that selectively incorporating these intermediate representations can increase the accuracy of OOD detection by up to \textbf{$10\%$} in far-OOD and over \textbf{$7\%$} in near-OOD benchmarks compared to state-of-the-art training-free methods across various model architectures and training objectives. Our findings reveal a new avenue for OOD detection research and uncover the impact of various training objectives and model architectures on confidence-based OOD detection methods.

Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection

TL;DR

This work interrogates the default reliance on final-layer representations for out-of-distribution detection and shows that intermediate-layer signals in pretrained vision–language models carry complementary, disorder-sensitive information. It extends the Maximum Concept Matching (MCM) framework across multiple layers and introduces an entropy-based, training-free layer selection to automatically fuse informative layers without OOD data. Empirical results across diverse backbones (including CLIP) and datasets demonstrate consistent gains in both far- and near-OOD regimes, with notable improvements in CLIP-like architectures and for near-OOD detection. The findings suggest a practical, architecture-aware path to more robust OOD detection that leverages internal model structure with only modest computational overhead. This approach has potential implications for real-world safety-critical AI systems and motivates future work on adaptive fusion policies and multi-modal OOD strategies.

Abstract

Out-of-distribution (OOD) detection is essential for reliably deploying machine learning models in the wild. Yet, most methods treat large pre-trained models as monolithic encoders and rely solely on their final-layer representations for detection. We challenge this wisdom. We reveal the \textit{intermediate layers} of pre-trained models, shaped by residual connections that subtly transform input projections, \textit{can} encode \textit{surprisingly rich and diverse signals} for detecting distributional shifts. Importantly, to exploit latent representation diversity across layers, we introduce an entropy-based criterion to \textit{automatically} identify layers offering the most complementary information in a training-free setting -- \textit{without access to OOD data}. We show that selectively incorporating these intermediate representations can increase the accuracy of OOD detection by up to \textbf{} in far-OOD and over \textbf{} in near-OOD benchmarks compared to state-of-the-art training-free methods across various model architectures and training objectives. Our findings reveal a new avenue for OOD detection research and uncover the impact of various training objectives and model architectures on confidence-based OOD detection methods.

Paper Structure

This paper contains 26 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Effect of layer combination length on OOD detection across architectures. The line plot (left) shows average FPR@95 as a function of fused layer count $N$, where all combinations include the final layer (e.g., $N{=}3$ may yield layers $\{1,2,11\}$). Shaded regions indicate the full range of FPR@95 values across combinations. The gray zone marks the baseline (single-layer) case. The bar plot (right) compares baseline FPR@95 (gray) with the best result from layer fusion (colored) for each model. See Appendix \ref{['appendix:layerwise_additional_1']} for more details.
  • Figure 2: Layer-wise OOD detection performance across architectures. Most architectures exhibit their best performance near the final layer, while early layers generally under-perform.
  • Figure 3: Layer-wise SVCCA similarity as a function of layer distance $\Delta$ for contrastive and classic models. CLIP models (blue) exhibit lower similarity across layers, indicating more progressive transformations, while Supervised models (yellow) retain higher layer-wise redundancy.
  • Figure 4: Top-1 agreement similarity across transformer layers for various vision models. Each matrix shows pairwise agreement in predicted top-1 classes across layers.
  • Figure 5: We propose a general approach to OOD detection that exploits features from intermediate layers of a visual encoder (left), extending the Maximum Concept Matching (MCM) method ming2022delvingoutofdistributiondetectionvisionlanguage. Section \ref{['sec:exploration']} analyzes the informativeness of intermediate features across architectures. Based on these insights, Section \ref{['sec:method']} introduces an entropy-based layer selector (right) that identifies the most reliable combination of layers for training-free OOD detection.
  • ...and 2 more figures