Estimating LLM Uncertainty with Evidence
Huan Ma, Jingdong Chen, Joey Tianyi Zhou, Guangyu Wang, Changqing Zhang
TL;DR
The paper tackles hallucinations in LLMs by criticizing probability-based uncertainty metrics that lose evidence strength during normalization. It introduces LogTokU, a logits-based framework that models token-level uncertainty as evidence via a Dirichlet distribution, decoupling aleatoric and epistemic uncertainty into four quadrants. The authors apply LogTokU to two downstream tasks: (1) dynamic decoding that adapts sampling based on uncertainty to balance diversity and accuracy, and (2) reliability estimation that aggregates token uncertainty into sentence-level reliability, demonstrated on SemEval and TruthfulQA benchmarks. Empirical results show LogTokU outperforms baselines in both decoding and reliability estimation across multiple LLM sizes, highlighting its efficiency and practical potential for robust, uncertainty-aware generation.
Abstract
Over the past few years, Large Language Models (LLMs) have developed rapidly and are widely applied in various domains. However, LLMs face the issue of hallucinations, generating responses that may be unreliable when the models lack relevant knowledge. To be aware of potential hallucinations, uncertainty estimation methods have been introduced, and most of them have confirmed that reliability lies in critical tokens. However, probability-based methods perform poorly in identifying token reliability, limiting their practical utility. In this paper, we reveal that the probability-based method fails to estimate token reliability due to the loss of evidence strength information which is accumulated in the training stage. Therefore, we present Logits-induced token uncertainty (LogTokU), a framework for estimating decoupled token uncertainty in LLMs, enabling real-time uncertainty estimation without requiring multiple sampling processes. We employ evidence modeling to implement LogTokU and use the estimated uncertainty to guide downstream tasks. The experimental results demonstrate that LogTokU has significant effectiveness and promise.
