Table of Contents
Fetching ...

Detecting out-of-distribution text using topological features of transformer-based language models

Andres Pollano, Anupam Chaudhuri, Anj Simmons

TL;DR

This work investigates detecting out-of-distribution (OOD) text inputs by extracting topological features from self-attention maps in transformer models, and compares them to traditional CLS-based sentence embeddings in BERT. By transforming attention maps into graphs and applying persistent homology via Vietoris-Rips filtration, the authors derive topological feature vectors that feed distance-based OOD scores (Mahalanobis and k-NN). Empirical results show that topology-based features excel at identifying far OOD (e.g., IMDB reviews) but struggle with near-domain and same-domain shifts, while CLS embeddings perform better on near/same-domain data, especially after fine-tuning. The study highlights the potential of topological methods to capture structural textual information and suggests combining topology with semantic embeddings for robust OOD detection in NLP applications.

Abstract

To safeguard machine learning systems that operate on textual data against out-of-distribution (OOD) inputs that could cause unpredictable behaviour, we explore the use of topological features of self-attention maps from transformer-based language models to detect when input text is out of distribution. Self-attention forms the core of transformer-based language models, dynamically assigning vectors to words based on context, thus in theory our methodology is applicable to any transformer-based language model with multihead self-attention. We evaluate our approach on BERT and compare it to a traditional OOD approach using CLS embeddings. Our results show that our approach outperforms CLS embeddings in distinguishing in-distribution samples from far-out-of-domain samples, but struggles with near or same-domain datasets.

Detecting out-of-distribution text using topological features of transformer-based language models

TL;DR

This work investigates detecting out-of-distribution (OOD) text inputs by extracting topological features from self-attention maps in transformer models, and compares them to traditional CLS-based sentence embeddings in BERT. By transforming attention maps into graphs and applying persistent homology via Vietoris-Rips filtration, the authors derive topological feature vectors that feed distance-based OOD scores (Mahalanobis and k-NN). Empirical results show that topology-based features excel at identifying far OOD (e.g., IMDB reviews) but struggle with near-domain and same-domain shifts, while CLS embeddings perform better on near/same-domain data, especially after fine-tuning. The study highlights the potential of topological methods to capture structural textual information and suggests combining topology with semantic embeddings for robust OOD detection in NLP applications.

Abstract

To safeguard machine learning systems that operate on textual data against out-of-distribution (OOD) inputs that could cause unpredictable behaviour, we explore the use of topological features of self-attention maps from transformer-based language models to detect when input text is out of distribution. Self-attention forms the core of transformer-based language models, dynamically assigning vectors to words based on context, thus in theory our methodology is applicable to any transformer-based language model with multihead self-attention. We evaluate our approach on BERT and compare it to a traditional OOD approach using CLS embeddings. Our results show that our approach outperforms CLS embeddings in distinguishing in-distribution samples from far-out-of-domain samples, but struggles with near or same-domain datasets.
Paper Structure (17 sections, 4 equations, 6 figures, 3 tables)

This paper contains 17 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Process of transforming an attention map to an attention graph (one per attention head)
  • Figure 2: Filtration process for the attention graph (Layer 7; Head 10) where edges with shorter distances below a threshold are added first, gradually connection the nodes until a complete graph is formed
  • Figure 3: Example persistence diagram and extracted topological features
  • Figure 4: The data representations from the TDA and CLS approaches for the far out-of-domain IMDB dataset.
  • Figure 5: The data representations from the TDA and CLS approaches for the near out-of-domain CNN/Dailymail dataset.
  • ...and 1 more figures