Table of Contents
Fetching ...

LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation

Longchao Da, Tiejin Chen, Lu Cheng, Hua Wei

TL;DR

This work addresses the trustworthiness of LLM outputs by introducing Directed Uncertainty Evaluation (D-UE), which builds a directed entailment graph from pairwise entailment probabilities and uses a Random Walk Laplacian to produce eigenvalue-based directional uncertainty. It further integrates semantic uncertainty and introduces a claim-based augmentation to mitigate vagueness in model responses. The framework is complemented by claim extraction and augmentation techniques, and is evaluated on multiple QA datasets showing improved uncertainty quantification (AUROC and AUARC) over baselines. The findings highlight the value of preserving directionality in entailment and enriching responses with augmented claims to better reveal a model’s true awareness, impacting practical LLM evaluation and deployment in critical domains.

Abstract

The Large language models (LLMs) have showcased superior capabilities in sophisticated tasks across various domains, stemming from basic question-answer (QA), they are nowadays used as decision assistants or explainers for unfamiliar content. However, they are not always correct due to the data sparsity in specific domain corpus, or the model's hallucination problems. Given this, how much should we trust the responses from LLMs? This paper presents a novel way to evaluate the uncertainty that captures the directional instability, by constructing a directional graph from entailment probabilities, and we innovatively conduct Random Walk Laplacian given the asymmetric property of a constructed directed graph, then the uncertainty is aggregated by the derived eigenvalues from the Laplacian process. We also provide a way to incorporate the existing work's semantics uncertainty with our proposed layer. Besides, this paper identifies the vagueness issues in the raw response set and proposes an augmentation approach to mitigate such a problem, we conducted extensive empirical experiments and demonstrated the superiority of our proposed solutions.

LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation

TL;DR

This work addresses the trustworthiness of LLM outputs by introducing Directed Uncertainty Evaluation (D-UE), which builds a directed entailment graph from pairwise entailment probabilities and uses a Random Walk Laplacian to produce eigenvalue-based directional uncertainty. It further integrates semantic uncertainty and introduces a claim-based augmentation to mitigate vagueness in model responses. The framework is complemented by claim extraction and augmentation techniques, and is evaluated on multiple QA datasets showing improved uncertainty quantification (AUROC and AUARC) over baselines. The findings highlight the value of preserving directionality in entailment and enriching responses with augmented claims to better reveal a model’s true awareness, impacting practical LLM evaluation and deployment in critical domains.

Abstract

The Large language models (LLMs) have showcased superior capabilities in sophisticated tasks across various domains, stemming from basic question-answer (QA), they are nowadays used as decision assistants or explainers for unfamiliar content. However, they are not always correct due to the data sparsity in specific domain corpus, or the model's hallucination problems. Given this, how much should we trust the responses from LLMs? This paper presents a novel way to evaluate the uncertainty that captures the directional instability, by constructing a directional graph from entailment probabilities, and we innovatively conduct Random Walk Laplacian given the asymmetric property of a constructed directed graph, then the uncertainty is aggregated by the derived eigenvalues from the Laplacian process. We also provide a way to incorporate the existing work's semantics uncertainty with our proposed layer. Besides, this paper identifies the vagueness issues in the raw response set and proposes an augmentation approach to mitigate such a problem, we conducted extensive empirical experiments and demonstrated the superiority of our proposed solutions.
Paper Structure (30 sections, 18 equations, 9 figures, 4 tables)

This paper contains 30 sections, 18 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: The left part is an example of directional entailment logic, (R1, Q) $\vdash$ (R2, Q) means the probability of R1 entails R2 given the context of question Q, and the right part shows the difference between existing symmetric similarity and our proposed directed relations.
  • Figure 2: The overall directional uncertainty quantification (UQ) framework of D-UE (right) compared to the traditional symmetric similarity-based uncertainty evaluation (left). As shown in the figure, the traditional method uses symmetric-based similarity and feeds into an estimator (e.g., Numset, Symmetric Laplacian, etc.) that only perceives monotonous semantics uncertainty $U^{s}$, while D-UE perceives both directions of entailment between response pairs and enhanced by text similarity, the Random Walk Laplacian is specially applied for complex and asymmetric property. Specifically, after Random Walk Laplacian, we derive the eigenvalues $\lambda_k$ from Laplacian and aggregate them following Eq. \ref{['eq:egen']} as the final uncertainty measurement $U^{d}_{Eigv}$. We also provide a way to fairly consider both semantic uncertainty and directional uncertainty in Section \ref{['sec:agg']}.
  • Figure 3: The comparison between D-UE and baseline method on AUARC, we conducted D-UE that aggregated the directional entailment uncertainty with each of the semantic measures, the evaluation improves on Coqa dataset.
  • Figure 4: The comparison between D-UE and baseline method. The figure shows the evaluation from the metric of AUROC, we conducted D-UE and aggregated the directional entailment uncertainty with each of the semantic measures, and the evaluation consistently improves on Coqa dataset.
  • Figure 5: The entailment probability map
  • ...and 4 more figures