Detecting out-of-distribution text using topological features of transformer-based language models
Andres Pollano, Anupam Chaudhuri, Anj Simmons
TL;DR
This work investigates detecting out-of-distribution (OOD) text inputs by extracting topological features from self-attention maps in transformer models, and compares them to traditional CLS-based sentence embeddings in BERT. By transforming attention maps into graphs and applying persistent homology via Vietoris-Rips filtration, the authors derive topological feature vectors that feed distance-based OOD scores (Mahalanobis and k-NN). Empirical results show that topology-based features excel at identifying far OOD (e.g., IMDB reviews) but struggle with near-domain and same-domain shifts, while CLS embeddings perform better on near/same-domain data, especially after fine-tuning. The study highlights the potential of topological methods to capture structural textual information and suggests combining topology with semantic embeddings for robust OOD detection in NLP applications.
Abstract
To safeguard machine learning systems that operate on textual data against out-of-distribution (OOD) inputs that could cause unpredictable behaviour, we explore the use of topological features of self-attention maps from transformer-based language models to detect when input text is out of distribution. Self-attention forms the core of transformer-based language models, dynamically assigning vectors to words based on context, thus in theory our methodology is applicable to any transformer-based language model with multihead self-attention. We evaluate our approach on BERT and compare it to a traditional OOD approach using CLS embeddings. Our results show that our approach outperforms CLS embeddings in distinguishing in-distribution samples from far-out-of-domain samples, but struggles with near or same-domain datasets.
