LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation
Longchao Da, Tiejin Chen, Lu Cheng, Hua Wei
TL;DR
This work addresses the trustworthiness of LLM outputs by introducing Directed Uncertainty Evaluation (D-UE), which builds a directed entailment graph from pairwise entailment probabilities and uses a Random Walk Laplacian to produce eigenvalue-based directional uncertainty. It further integrates semantic uncertainty and introduces a claim-based augmentation to mitigate vagueness in model responses. The framework is complemented by claim extraction and augmentation techniques, and is evaluated on multiple QA datasets showing improved uncertainty quantification (AUROC and AUARC) over baselines. The findings highlight the value of preserving directionality in entailment and enriching responses with augmented claims to better reveal a model’s true awareness, impacting practical LLM evaluation and deployment in critical domains.
Abstract
The Large language models (LLMs) have showcased superior capabilities in sophisticated tasks across various domains, stemming from basic question-answer (QA), they are nowadays used as decision assistants or explainers for unfamiliar content. However, they are not always correct due to the data sparsity in specific domain corpus, or the model's hallucination problems. Given this, how much should we trust the responses from LLMs? This paper presents a novel way to evaluate the uncertainty that captures the directional instability, by constructing a directional graph from entailment probabilities, and we innovatively conduct Random Walk Laplacian given the asymmetric property of a constructed directed graph, then the uncertainty is aggregated by the derived eigenvalues from the Laplacian process. We also provide a way to incorporate the existing work's semantics uncertainty with our proposed layer. Besides, this paper identifies the vagueness issues in the raw response set and proposes an augmentation approach to mitigate such a problem, we conducted extensive empirical experiments and demonstrated the superiority of our proposed solutions.
