Explaining Sources of Uncertainty in Automated Fact-Checking
Jingyi Sun, Greta Warren, Irina Shklovski, Isabelle Augenstein
TL;DR
The paper tackles the problem of explaining uncertainty in automated fact-checking by grounding model uncertainty in explicit span-level interactions among a claim and multiple evidences. It introduces CLUE, a plug-and-play framework that (1) unsupervisedly identifies conflict- and agreement-bearing spans across claim–evidence pairs, (2) quantifies predictive uncertainty via entropy $u(X)$ over the candidate labels, and (3) generates uncertainty explanations through instruction-based prompting or attention steering centered on the extracted spans. Empirically, CLUE improves faithfulness to the model's uncertainty and alignment with fact-checking labels across three open-weight LLMs and two health-domain datasets, with human evaluators finding the explanations more helpful, informative, and coherent than baseline prompts. The work demonstrates that grounding explanations in concrete evidentiary conflicts enables more actionable, maintainable, and generalizable support for fact-checking and other information synthesis tasks, without any model fine-tuning. $u(X)$ and $P(y_i|X)$ are defined where applicable and all math is presented in $...$ format for clarity and reproducibility.
Abstract
Understanding sources of a model's uncertainty regarding its predictions is crucial for effective human-AI collaboration. Prior work proposes using numerical uncertainty or hedges ("I'm not sure, but ..."), which do not explain uncertainty that arises from conflicting evidence, leaving users unable to resolve disagreements or rely on the output. We introduce CLUE (Conflict-and-Agreement-aware Language-model Uncertainty Explanations), the first framework to generate natural language explanations of model uncertainty by (i) identifying relationships between spans of text that expose claim-evidence or inter-evidence conflicts and agreements that drive the model's predictive uncertainty in an unsupervised way, and (ii) generating explanations via prompting and attention steering that verbalize these critical interactions. Across three language models and two fact-checking datasets, we show that CLUE produces explanations that are more faithful to the model's uncertainty and more consistent with fact-checking decisions than prompting for uncertainty explanations without span-interaction guidance. Human evaluators judge our explanations to be more helpful, more informative, less redundant, and more logically consistent with the input than this baseline. CLUE requires no fine-tuning or architectural changes, making it plug-and-play for any white-box language model. By explicitly linking uncertainty to evidence conflicts, it offers practical support for fact-checking and generalises readily to other tasks that require reasoning over complex information.
