Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Fran Jelenić, Josip Jukić, Martin Tutek, Mate Puljiz, Jan Šnajder
TL;DR
This work tackles practical OOD detection for Transformer-based text classifiers by exploiting the smoothness of between-layer transformations in intermediate representations. The BLOOD method measures the Frobenius norm of layerwise Jacobians to quantify transformation smoothness and uses an unbiased estimator based on random Jacobian-vector products, enabling application to pre-trained models without training data. Empirical results across RoBERTa and ELECTRA show BLOOD, especially BLOOD_L, frequently outperforms white-box baselines and remains competitive with open-box methods, with stronger gains on more complex tasks and background shifts. The findings suggest that ID representations are learned with smoother upper-layer transitions, reflecting the model’s focus on ID regions during training, and that dataset complexity modulates BLOOD’s effectiveness. Key contributions include: (i) BLOOD as a weight-only OOD detector for transformers, (ii) an unbiased Jacobian-Frobenius estimator for practical computation, (iii) thorough analysis of how task complexity and distribution shift type affect OOD detection, and (iv) demonstration of BLOOD’s utility in both text and image modalities via cross-domain experiments.
Abstract
Effective out-of-distribution (OOD) detection is crucial for reliable machine learning models, yet most current methods are limited in practical use due to requirements like access to training data or intervention in training. We present a novel method for detecting OOD data in Transformers based on transformation smoothness between intermediate layers of a network (BLOOD), which is applicable to pre-trained models without access to training data. BLOOD utilizes the tendency of between-layer representation transformations of in-distribution (ID) data to be smoother than the corresponding transformations of OOD data, a property that we also demonstrate empirically. We evaluate BLOOD on several text classification tasks with Transformer networks and demonstrate that it outperforms methods with comparable resource requirements. Our analysis also suggests that when learning simpler tasks, OOD data transformations maintain their original sharpness, whereas sharpness increases with more complex tasks.
