Table of Contents
Fetching ...

The Anatomy of Uncertainty in LLMs

Aditya Taparia, Ransalu Senanayake, Kowshik Thopalli, Vivek Narayanaswamy

Abstract

Understanding why a large language model (LLM) is uncertain about the response is important for their reliable deployment. Current approaches, which either provide a single uncertainty score or rely on the classical aleatoric-epistemic dichotomy, fail to offer actionable insights for improving the generative model. Recent studies have also shown that such methods are not enough for understanding uncertainty in LLMs. In this work, we advocate for an uncertainty decomposition framework that dissects LLM uncertainty into three distinct semantic components: (i) input ambiguity, arising from ambiguous prompts; (ii) knowledge gaps, caused by insufficient parametric evidence; and (iii) decoding randomness, stemming from stochastic sampling. Through a series of experiments we demonstrate that the dominance of these components can shift across model size and task. Our framework provides a better understanding to audit LLM reliability and detect hallucinations, paving the way for targeted interventions and more trustworthy systems.

The Anatomy of Uncertainty in LLMs

Abstract

Understanding why a large language model (LLM) is uncertain about the response is important for their reliable deployment. Current approaches, which either provide a single uncertainty score or rely on the classical aleatoric-epistemic dichotomy, fail to offer actionable insights for improving the generative model. Recent studies have also shown that such methods are not enough for understanding uncertainty in LLMs. In this work, we advocate for an uncertainty decomposition framework that dissects LLM uncertainty into three distinct semantic components: (i) input ambiguity, arising from ambiguous prompts; (ii) knowledge gaps, caused by insufficient parametric evidence; and (iii) decoding randomness, stemming from stochastic sampling. Through a series of experiments we demonstrate that the dominance of these components can shift across model size and task. Our framework provides a better understanding to audit LLM reliability and detect hallucinations, paving the way for targeted interventions and more trustworthy systems.

Paper Structure

This paper contains 27 sections, 11 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Uncertainty decomposition on TriviaQA examples of Gemma 3 27B model. Our framework identifies the dominant source of uncertainty---input ambiguity (left), knowledge gaps (middle), or decoding randomness (right)---and maps each to a targeted mitigation action.
  • Figure 2: Illustration of input ambiguity estimation. We generate semantically equivalent paraphrases of the original question, obtain one response for each paraphrase using the same model and decoding policy, and group the responses into semantic clusters.
  • Figure 3: Illustration of knowledge-gap uncertainty estimation. For a fixed input and decoding policy, we query an ensemble of LoRA-adapted model realizations and group the resulting responses into semantic clusters. Disagreement across ensemble members reflects uncertainty arising from parametric knowledge gaps.
  • Figure 4: (a) Failure prediction performance (AUROC) of Input and Decoding uncertainty across the Gemma 3 model family on TriviaQA. As models scale, input ambiguity becomes a more reliable predictor of failure. (b) Comparison of failure prediction AUROC for Decoding Uncertainty when calculated using different decoding strategies. Stochastic methods (e.g., Top-k, Top-p) are significantly more effective at revealing uncertainty than deterministic ones (e.g., Greedy).
  • Figure 5: Joint analysis of Input Ambiguity ($U_{\text{input}}$) and Decoding Randomness ($U_{\text{dec}}$) on TriviaQA, partitioned by uncertainty quantiles. The heatmaps reveal an important insight about overconfidence: while the failure rate (b) increases with uncertainty, the model is most poorly calibrated (highest ECE in a) when it appears most confident (low uncertainty)
  • ...and 1 more figures