Table of Contents
Fetching ...

Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models

Artem Vazhentsev, Ekaterina Fadeeva, Rui Xing, Gleb Kuzmin, Ivan Lazichny, Alexander Panchenko, Preslav Nakov, Timothy Baldwin, Maxim Panov, Artem Shelmanov

TL;DR

This paper tackles the challenge of uncertainty quantification for autoregressive LLMs by addressing the conditional dependencies between generation steps. It introduces Trainable Attention-based Dependency (TAD), a supervised regression approach that learns unconditional token confidence from attention maps, current generation probabilities, and recurrent uncertainties via a two-stage training procedure. Empirical results across ten datasets and three LLMs show that TAD outperforms a wide range of unsupervised and supervised baselines in selective generation tasks, with strong robustness and cross-domain applicability, particularly in QA and MMLU scenarios. The method maintains practical efficiency, adding only about 5% runtime overhead, making it suitable for deployment in real-time generation safety pipelines; future work includes extending to retrieval-augmented generation and scaling to larger models while addressing supervision requirements.

Abstract

Uncertainty quantification (UQ) has emerged as a promising approach for detecting hallucinations and low-quality output of Large Language Models (LLMs). However, obtaining proper uncertainty scores is complicated by the conditional dependency between the generation steps of an autoregressive LLM because it is hard to model it explicitly. Here, we propose to learn this dependency from attention-based features. In particular, we train a regression model that leverages LLM attention maps, probabilities on the current generation step, and recurrently computed uncertainty scores from previously generated tokens. To incorporate the recurrent features, we also suggest a two-staged training procedure. Our experimental evaluation on ten datasets and three LLMs shows that the proposed method is highly effective for selective generation, achieving substantial improvements over rivaling unsupervised and supervised approaches.

Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models

TL;DR

This paper tackles the challenge of uncertainty quantification for autoregressive LLMs by addressing the conditional dependencies between generation steps. It introduces Trainable Attention-based Dependency (TAD), a supervised regression approach that learns unconditional token confidence from attention maps, current generation probabilities, and recurrent uncertainties via a two-stage training procedure. Empirical results across ten datasets and three LLMs show that TAD outperforms a wide range of unsupervised and supervised baselines in selective generation tasks, with strong robustness and cross-domain applicability, particularly in QA and MMLU scenarios. The method maintains practical efficiency, adding only about 5% runtime overhead, making it suitable for deployment in real-time generation safety pipelines; future work includes extending to retrieval-augmented generation and scaling to larger models while addressing supervision requirements.

Abstract

Uncertainty quantification (UQ) has emerged as a promising approach for detecting hallucinations and low-quality output of Large Language Models (LLMs). However, obtaining proper uncertainty scores is complicated by the conditional dependency between the generation steps of an autoregressive LLM because it is hard to model it explicitly. Here, we propose to learn this dependency from attention-based features. In particular, we train a regression model that leverages LLM attention maps, probabilities on the current generation step, and recurrently computed uncertainty scores from previously generated tokens. To incorporate the recurrent features, we also suggest a two-staged training procedure. Our experimental evaluation on ten datasets and three LLMs shows that the proposed method is highly effective for selective generation, achieving substantial improvements over rivaling unsupervised and supervised approaches.
Paper Structure (25 sections, 4 equations, 4 figures, 19 tables, 1 algorithm)

This paper contains 25 sections, 4 equations, 4 figures, 19 tables, 1 algorithm.

Figures (4)

  • Figure 1: An illustration of the proposed method TAD. The figure shows the generated tokens, the uncertainty scores for the generated sequence, and the probabilities assigned by an LLM and by TAD (represented with bars). The output is generated by LLaMa-3.1 8B for the question What is the language with the highest number of total speakers in the world that is not an official language of the U.S.? The LLM starts by generating the token Spanish that leads to the erroneous answer. The probabilities estimated by the LLM are high for all tokens except for the first one, which makes the uncertainty scores based on raw probabilities misleadingly low. On the contrary, TAD takes into account uncertainty from the previous step using a trainable model $C(\cdot)$ based on attention, resulting in a high overall uncertainty for the generated answer.
  • Figure 2: Summary of 33 experimental setups with various models and datasets. Each cell in the diagram presents a fraction of experiments where a method from a row outperforms a method from a column. Warmer colors indicate better results.
  • Figure 3: Comparison of the attention weights of Llama-3.1 8B to the last preceding token for each generated token for correct and incorrect answers to input questions from the TruthfulQA dataset. The $y$-axis shows the generated tokens, and the $x$-axis represents the attention heads in the 30th layer. Warmer colors indicate higher attention values. In the incorrect answer (Figure \ref{['fig:attention_incorrect']}), the model hallucinates the factually incorrect tokens The Sahara (the correct answer is Antarctica). Notably, while the 25th attention head consistently assigns high attention to the preceding token in both outputs, this attention noticeably drops for the hallucinated tokens The Sahara. This decrease in attention could serve as a valuable signal for a hallucination detector in the TAD method.
  • Figure 4: Normalized average weights of linear regression for different attention layers in the TAD method across the considered datasets. Warmer color indicates a higher impact on the TAD performance.