Table of Contents
Fetching ...

Hallucination Detection in LLMs with Topological Divergence on Attention Graphs

Alexandra Bazarova, Aleksandr Yugay, Andrey Shulga, Alina Ermilova, Andrei Volodichev, Konstantin Polev, Julia Belikova, Rauf Parchiev, Dmitry Simakov, Maxim Savchenko, Andrey Savchenko, Serguei Barannikov, Alexey Zaytsev

TL;DR

The paper addresses hallucinations in LLMs within retrieval-augmented generation by proposing TOHA, a training-free detector that exploits the topology of attention graphs to measure novelty via the topological divergence $MTop{-}Div_G(R, P)$. It identifies a small set of hallucination-aware attention heads and averages their divergence to detect hallucinations efficiently across diverse tasks and models. Empirical results show TOHA achieving state-of-the-art or competitive performance on QA and summarization benchmarks with minimal annotated data and significantly faster inference. The work demonstrates the value of combining topological data analysis with transformer attention to assess factual reliability, offering practical benefits for safe and scalable deployment of LLMs.

Abstract

Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models (LLMs). We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting, which leverages a topological divergence metric to quantify the structural properties of graphs induced by attention matrices. Examining the topological divergence between prompt and response subgraphs reveals consistent patterns: higher divergence values in specific attention heads correlate with hallucinated outputs, independent of the dataset. Extensive experiments - including evaluation on question answering and summarization tasks - show that our approach achieves state-of-the-art or competitive results on several benchmarks while requiring minimal annotated data and computational resources. Our findings suggest that analyzing the topological structure of attention matrices can serve as an efficient and robust indicator of factual reliability in LLMs.

Hallucination Detection in LLMs with Topological Divergence on Attention Graphs

TL;DR

The paper addresses hallucinations in LLMs within retrieval-augmented generation by proposing TOHA, a training-free detector that exploits the topology of attention graphs to measure novelty via the topological divergence . It identifies a small set of hallucination-aware attention heads and averages their divergence to detect hallucinations efficiently across diverse tasks and models. Empirical results show TOHA achieving state-of-the-art or competitive performance on QA and summarization benchmarks with minimal annotated data and significantly faster inference. The work demonstrates the value of combining topological data analysis with transformer attention to assess factual reliability, offering practical benefits for safe and scalable deployment of LLMs.

Abstract

Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models (LLMs). We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting, which leverages a topological divergence metric to quantify the structural properties of graphs induced by attention matrices. Examining the topological divergence between prompt and response subgraphs reveals consistent patterns: higher divergence values in specific attention heads correlate with hallucinated outputs, independent of the dataset. Extensive experiments - including evaluation on question answering and summarization tasks - show that our approach achieves state-of-the-art or competitive results on several benchmarks while requiring minimal annotated data and computational resources. Our findings suggest that analyzing the topological structure of attention matrices can serve as an efficient and robust indicator of factual reliability in LLMs.

Paper Structure

This paper contains 46 sections, 2 theorems, 17 equations, 10 figures, 15 tables, 1 algorithm.

Key Result

Proposition 3.1

Consider an attention graph $G$ with vertex set $V_G$ and its complementary vertex subsets $P, R$, where $P \cup R = V_G$ and $P \cap R = \varnothing$. $\operatorname{MTop-Div}(R, P)$ value equals the length of the minimal spanning forest (MSF) attaching $R$ to $P$.

Figures (10)

  • Figure 1: a) An attention map. Blue and green denotes the prompt and response tokens, respectively. b) The corresponding attention graph $G$. Prompt tokens $P$ are located on the left, response tokens $R$ --- on the right. To keep figure neat, we only plot the edges with an attention score of no less than $0.15$. c) The minimum spanning forest attaching $R$ to $P$ and the corresponding $\operatorname{MTop-Div}$ value.
  • Figure 2: $\Delta_{ij}$ values for $ij$-th heads. Vertical axis corresponds to the difference on dataset (B), horizontal --- to the one on dataset (A). The heads that separate samples best are highlighted in pink; their (layer, head) positions are reflected in the legend.
  • Figure 3: (a)-(b): Detection quality dependence on the size of a probe set, models: Mistral-7B (left), LLama-2-7B (right). (c) Generalizability between the datasets, model: Mistral-7B. The vertical axis corresponds to the origin of the probe set, the horizontal axis to the test dataset.
  • Figure 4: a): Inference time comparison (seconds) for various methods evaluated on 16 MS MARCO samples using Mistral-7B. SelfCheckGPT measurement includes one additional generated answer per sample. b)-c): ROC-AUC performance of TOHA across different numbers of selected attention heads ($N_{max}$) on Mistral-7B and Llama-2-7B.
  • Figure 5: Attention to the first token (<s> in this example) for (a) a hallucinated generation and (b) a grounded one. Green highlights edges and nodes corresponding to grounded tokens, while yellow indicates hallucinated tokens. Model: Mistral-7B.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 3.1
  • Proposition D.1