Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices

Elizaveta Kostenok; Daniil Cherniavskii; Alexey Zaytsev

Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices

Elizaveta Kostenok, Daniil Cherniavskii, Alexey Zaytsev

TL;DR

This work tackles uncertainty estimation for Transformer predictions by leveraging the geometry of attention maps. It introduces topological features derived from attention graphs, including cross-barcode statistics that compare pairs of attention matrices, and trains a lightweight Score Predictor to output a confidence score without modifying the Transformer. Across adversarial text detection and acceptability judgments in English, Italian, and Russian, the topological UE method outperforms strong baselines such as Softmax, MC Dropout, and Mahalanobis, with gains up to 16% in the accuracy-rejection framework. The approach emphasizes interpretability and efficiency, revealing that information is particularly concentrated in the last-layer attention and that cross-head pairings substantially enhance uncertainty estimates, offering a practical alternative to ensembles for large-scale NLP models.

Abstract

Transformer-based language models have set new benchmarks across a wide range of NLP tasks, yet reliably estimating the uncertainty of their predictions remains a significant challenge. Existing uncertainty estimation (UE) techniques often fall short in classification tasks, either offering minimal improvements over basic heuristics or relying on costly ensemble models. Moreover, attempts to leverage common embeddings for UE in linear probing scenarios have yielded only modest gains, indicating that alternative model components should be explored. We tackle these limitations by harnessing the geometry of attention maps across multiple heads and layers to assess model confidence. Our approach extracts topological features from attention matrices, providing a low-dimensional, interpretable representation of the model's internal dynamics. Additionally, we introduce topological features to compare attention patterns across heads and layers. Our method significantly outperforms existing UE techniques on benchmarks for acceptability judgments and artificial text detection, offering a more efficient and interpretable solution for uncertainty estimation in large-scale language models.

Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 5 figures, 5 tables)

This paper contains 21 sections, 3 equations, 5 figures, 5 tables.

Introduction
Related work
Uncertainty Estimation
Topological Data Analysis of Attentions
Methodology
Types of topological features and their calculation
Design of the Score Predictor model and objective function
Testing Method
Experiments
Data
Models
Baselines
Analysis of the SingleAttention features
Analysis of the PairedAttention features
Results
...and 6 more sections

Figures (5)

Figure 1: Learning confidence from the topological features of BERT attentions In order to get UE for a fine-tuned language model, we first generate and store attention maps and outputs of the final classification layer, feeding the training instances into the language model. Next, we preprocess the attention maps by creating graph representations, computing barcodes for individual attention heads, and cross-barcodes for pairs of attention heads. We proceed by calculating a subset of topological features based on a feature selection strategy. Finally, we provide the precomputed topological statistics to an auxiliary model, combine the scores with BERT outputs in the objective function, and initiate the optimization process.
Figure 2: Clusters formed by the BERT attention heads after projecting to the two-dimensional space
Figure 3: Accuracy rejection curves of UE methods for the BERT-base model on the En-CoLA test set
Figure 4: Analysis of graph feature components using Shapley values. The number of the component is plotted along the vertical axis, and the Shapley values along the horizontal axis. The color gradient from blue to pink corresponds to the increase in the absolute value of the graph feature
Figure 5: An example of a cross-barcode between pairs of Attention matrices, given by the numbers of the layer and the head of the Transformer (6, 12) and (12, 6). H0 and H1 correspond to 0- and 1-dimensional homology

Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices

TL;DR

Abstract

Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices

Authors

TL;DR

Abstract

Table of Contents

Figures (5)