Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices
Elizaveta Kostenok, Daniil Cherniavskii, Alexey Zaytsev
TL;DR
This work tackles uncertainty estimation for Transformer predictions by leveraging the geometry of attention maps. It introduces topological features derived from attention graphs, including cross-barcode statistics that compare pairs of attention matrices, and trains a lightweight Score Predictor to output a confidence score without modifying the Transformer. Across adversarial text detection and acceptability judgments in English, Italian, and Russian, the topological UE method outperforms strong baselines such as Softmax, MC Dropout, and Mahalanobis, with gains up to 16% in the accuracy-rejection framework. The approach emphasizes interpretability and efficiency, revealing that information is particularly concentrated in the last-layer attention and that cross-head pairings substantially enhance uncertainty estimates, offering a practical alternative to ensembles for large-scale NLP models.
Abstract
Transformer-based language models have set new benchmarks across a wide range of NLP tasks, yet reliably estimating the uncertainty of their predictions remains a significant challenge. Existing uncertainty estimation (UE) techniques often fall short in classification tasks, either offering minimal improvements over basic heuristics or relying on costly ensemble models. Moreover, attempts to leverage common embeddings for UE in linear probing scenarios have yielded only modest gains, indicating that alternative model components should be explored. We tackle these limitations by harnessing the geometry of attention maps across multiple heads and layers to assess model confidence. Our approach extracts topological features from attention matrices, providing a low-dimensional, interpretable representation of the model's internal dynamics. Additionally, we introduce topological features to compare attention patterns across heads and layers. Our method significantly outperforms existing UE techniques on benchmarks for acceptability judgments and artificial text detection, offering a more efficient and interpretable solution for uncertainty estimation in large-scale language models.
