Table of Contents
Fetching ...

Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models

Artem Vazhentsev, Lyudmila Rvanova, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov

TL;DR

This work tackles the challenge of eliciting truthful outputs from large language models by introducing a token-level density-based uncertainty quantification approach. It adapts Mahalanobis distance to per-token embeddings across multiple decoder layers, aggregates layer-wise scores via PCA, and trains a lightweight linear regressor that can incorporate sequence probability, with an optional hybrid score. Across eleven datasets and two task types, the method outperforms existing UQ approaches in both selective generation and claim-level fact-checking, while maintaining computational efficiency and demonstrating notable out-of-domain generalization. The proposed framework offers a practical, scalable pathway to improve reliability of LLM-generated content in real-world applications.

Abstract

Uncertainty quantification (UQ) is a prominent approach for eliciting truthful answers from large language models (LLMs). To date, information-based and consistency-based UQ have been the dominant UQ methods for text generation via LLMs. Density-based methods, despite being very effective for UQ in text classification with encoder-based models, have not been very successful with generative LLMs. In this work, we adapt Mahalanobis Distance (MD) - a well-established UQ technique in classification tasks - for text generation and introduce a new supervised UQ method. Our method extracts token embeddings from multiple layers of LLMs, computes MD scores for each token, and uses linear regression trained on these features to provide robust uncertainty scores. Through extensive experiments on eleven datasets, we demonstrate that our approach substantially improves over existing UQ methods, providing accurate and computationally efficient uncertainty scores for both sequence-level selective generation and claim-level fact-checking tasks. Our method also exhibits strong generalization to out-of-domain data, making it suitable for a wide range of LLM-based applications.

Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models

TL;DR

This work tackles the challenge of eliciting truthful outputs from large language models by introducing a token-level density-based uncertainty quantification approach. It adapts Mahalanobis distance to per-token embeddings across multiple decoder layers, aggregates layer-wise scores via PCA, and trains a lightweight linear regressor that can incorporate sequence probability, with an optional hybrid score. Across eleven datasets and two task types, the method outperforms existing UQ approaches in both selective generation and claim-level fact-checking, while maintaining computational efficiency and demonstrating notable out-of-domain generalization. The proposed framework offers a practical, scalable pathway to improve reliability of LLM-generated content in real-world applications.

Abstract

Uncertainty quantification (UQ) is a prominent approach for eliciting truthful answers from large language models (LLMs). To date, information-based and consistency-based UQ have been the dominant UQ methods for text generation via LLMs. Density-based methods, despite being very effective for UQ in text classification with encoder-based models, have not been very successful with generative LLMs. In this work, we adapt Mahalanobis Distance (MD) - a well-established UQ technique in classification tasks - for text generation and introduce a new supervised UQ method. Our method extracts token embeddings from multiple layers of LLMs, computes MD scores for each token, and uses linear regression trained on these features to provide robust uncertainty scores. Through extensive experiments on eleven datasets, we demonstrate that our approach substantially improves over existing UQ methods, providing accurate and computationally efficient uncertainty scores for both sequence-level selective generation and claim-level fact-checking tasks. Our method also exhibits strong generalization to out-of-domain data, making it suitable for a wide range of LLM-based applications.

Paper Structure

This paper contains 37 sections, 6 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: An illustration of the proposed method. After each decoder layer, the embeddings of each generated token are extracted. Subsequently, we compute the Mahalanobis distance for each token and layer and then average over all tokens in the generated sequence. Finally, we train a linear regression on the PCA decomposition of the calculated features with the addition of sequence probability to predict the uncertainty of the generation.
  • Figure 2: Performance of embeddings from various layers in density-based scores. PRR$\uparrow$ for density-based methods computed using embeddings from various layers of Llama 8b v3.1 (upper row) and Gemma 9b v2 (lower row) models. Raw ATMD/ATRMD denotes a corresponding method without selecting embeddings using the correctness criterion. Higher values indicate better results.
  • Figure 3: Dependency of PRR$\uparrow$ of the SATRMD+MSP and HUQ-SATRMD methods on the correctness threshold for the embedding selection for the centroid and covariance matrix for MD for the Llama 8b v3.1 model. Higher values indicate better results.
  • Figure 4: Dependency of PRR$\uparrow$ of the SATRMD+MSP and HUQ-SATRMD methods on the number of the PCA components for the features of linear regression for the Llama 8b v3.1 model. Higher values indicate better results.
  • Figure 5: Dependency of PRR$\uparrow$ of the supervised methods on the size of the training dataset for the Llama 8b v3.1 model. Higher values indicate better results.