Table of Contents
Fetching ...

Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness

Fran Jelenić, Josip Jukić, Martin Tutek, Mate Puljiz, Jan Šnajder

TL;DR

This work tackles practical OOD detection for Transformer-based text classifiers by exploiting the smoothness of between-layer transformations in intermediate representations. The BLOOD method measures the Frobenius norm of layerwise Jacobians to quantify transformation smoothness and uses an unbiased estimator based on random Jacobian-vector products, enabling application to pre-trained models without training data. Empirical results across RoBERTa and ELECTRA show BLOOD, especially BLOOD_L, frequently outperforms white-box baselines and remains competitive with open-box methods, with stronger gains on more complex tasks and background shifts. The findings suggest that ID representations are learned with smoother upper-layer transitions, reflecting the model’s focus on ID regions during training, and that dataset complexity modulates BLOOD’s effectiveness. Key contributions include: (i) BLOOD as a weight-only OOD detector for transformers, (ii) an unbiased Jacobian-Frobenius estimator for practical computation, (iii) thorough analysis of how task complexity and distribution shift type affect OOD detection, and (iv) demonstration of BLOOD’s utility in both text and image modalities via cross-domain experiments.

Abstract

Effective out-of-distribution (OOD) detection is crucial for reliable machine learning models, yet most current methods are limited in practical use due to requirements like access to training data or intervention in training. We present a novel method for detecting OOD data in Transformers based on transformation smoothness between intermediate layers of a network (BLOOD), which is applicable to pre-trained models without access to training data. BLOOD utilizes the tendency of between-layer representation transformations of in-distribution (ID) data to be smoother than the corresponding transformations of OOD data, a property that we also demonstrate empirically. We evaluate BLOOD on several text classification tasks with Transformer networks and demonstrate that it outperforms methods with comparable resource requirements. Our analysis also suggests that when learning simpler tasks, OOD data transformations maintain their original sharpness, whereas sharpness increases with more complex tasks.

Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness

TL;DR

This work tackles practical OOD detection for Transformer-based text classifiers by exploiting the smoothness of between-layer transformations in intermediate representations. The BLOOD method measures the Frobenius norm of layerwise Jacobians to quantify transformation smoothness and uses an unbiased estimator based on random Jacobian-vector products, enabling application to pre-trained models without training data. Empirical results across RoBERTa and ELECTRA show BLOOD, especially BLOOD_L, frequently outperforms white-box baselines and remains competitive with open-box methods, with stronger gains on more complex tasks and background shifts. The findings suggest that ID representations are learned with smoother upper-layer transitions, reflecting the model’s focus on ID regions during training, and that dataset complexity modulates BLOOD’s effectiveness. Key contributions include: (i) BLOOD as a weight-only OOD detector for transformers, (ii) an unbiased Jacobian-Frobenius estimator for practical computation, (iii) thorough analysis of how task complexity and distribution shift type affect OOD detection, and (iv) demonstration of BLOOD’s utility in both text and image modalities via cross-domain experiments.

Abstract

Effective out-of-distribution (OOD) detection is crucial for reliable machine learning models, yet most current methods are limited in practical use due to requirements like access to training data or intervention in training. We present a novel method for detecting OOD data in Transformers based on transformation smoothness between intermediate layers of a network (BLOOD), which is applicable to pre-trained models without access to training data. BLOOD utilizes the tendency of between-layer representation transformations of in-distribution (ID) data to be smoother than the corresponding transformations of OOD data, a property that we also demonstrate empirically. We evaluate BLOOD on several text classification tasks with Transformer networks and demonstrate that it outperforms methods with comparable resource requirements. Our analysis also suggests that when learning simpler tasks, OOD data transformations maintain their original sharpness, whereas sharpness increases with more complex tasks.
Paper Structure (23 sections, 4 theorems, 9 equations, 10 figures, 25 tables)

This paper contains 23 sections, 4 theorems, 9 equations, 10 figures, 25 tables.

Key Result

Corollary 1

Let ${\bm{J}}({\bm{x}}) \in \mathbb{R}^{m \times n}$ be a Jacobian matrix, and let ${\mathbf{v}} \in \mathbb{R}^n$ and ${\mathbf{w}} \in \mathbb{R}^m$ be random vectors whose elements are independent random variables with zero mean and unit variance. Then, $\mathbb{E}[({\mathbf{w}}^{\intercal}{\bm{J

Figures (10)

  • Figure 1: The impact of change of each layer on BLOOD score across layers. Top row: Change in intermediate representations of training instances by layer for (a) RoBERTa and (b) ELECTRA. The scores are averaged across instances for the ar dataset. The black error bars denote the standard deviation. Middle row: BLOOD score by layer of models for ar before fine-tuning. Bottom row: BLOOD score by layer of models for ar after fine-tuning.
  • Figure 2: Data maps with RoBERTa for test sets of (a) bp, (b) bp2, (c) ar, ar2, mg, and mg2. Each subfigure shows data map and histograms of confidence, variability, and correctness of instances. Data maps for ELECTRA are qualitatively the same.
  • Figure 3: Box plots of change in BLOOD$_{L}$ scores with an increase in the degree of distribution shift for the tasks of semantic and background shift detection for (a) RoBERTa and (b) ELECTRA. The amount of distribution shift increases from left to right: training distribution, test ID data distribution, Near-OOD distribution, and Far-OOD distribution.
  • Figure 4: The impact of change of each layer on BLOOD score across layers. Top row: Change in intermediate representations of training instances by layer for (a) RoBERTa and (b) ELECTRA. The scores are averaged across instances for the sst dataset. The black error bars denote the standard deviation. Middle row: BLOOD score by layer of models for sst before fine-tuning. Bottom row: BLOOD score by layer of models for sst after fine-tuning.
  • Figure 5: The impact of change of each layer on BLOOD score across layers. Top row: Change in intermediate representations of training instances by layer for (a) RoBERTa and (b) ELECTRA. The scores are averaged across instances for the subj dataset. The black error bars denote the standard deviation. Middle row: BLOOD score by layer of models for subj before fine-tuning. Bottom row: BLOOD score by layer of models for subj after fine-tuning.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Corollary 1
  • Remark 1
  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • proof : Proof of \ref{['thr:general']}
  • Remark 2
  • Remark 3
  • ...and 1 more