Table of Contents
Fetching ...

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, Masaki Adachi

TL;DR

This work proposes novel prompt-based uncertainty elicitation techniques grounded in precise probabilities, a principled framework for repesenting and eliciting higher-order uncertainty, and introduces general-purpose prompting and post-processing procedures to directly elicit and quantify both orders of uncertainty.

Abstract

Despite the growing demand for eliciting uncertainty from large language models (LLMs), empirical evidence suggests that LLM behavior is not always adequately captured by the elicitation techniques developed under the classical probabilistic uncertainty framework. This mismatch leads to systematic failure modes, particularly in settings that involve ambiguous question-answering, in-context learning, and self-reflection. To address this, we propose novel prompt-based uncertainty elicitation techniques grounded in \emph{imprecise probabilities}, a principled framework for repesenting and eliciting higher-order uncertainty. Here, first-order uncertainty captures uncertainty over possible responses to a prompt, while second-order uncertainty (uncertainty about uncertainty) quantifies indeterminacy in the underlying probability model itself. We introduce general-purpose prompting and post-processing procedures to directly elicit and quantify both orders of uncertainty, and demonstrate their effectiveness across diverse settings. Our approach enables more faithful uncertainty reporting from LLMs, improving credibility and supporting downstream decision-making.

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

TL;DR

This work proposes novel prompt-based uncertainty elicitation techniques grounded in precise probabilities, a principled framework for repesenting and eliciting higher-order uncertainty, and introduces general-purpose prompting and post-processing procedures to directly elicit and quantify both orders of uncertainty.

Abstract

Despite the growing demand for eliciting uncertainty from large language models (LLMs), empirical evidence suggests that LLM behavior is not always adequately captured by the elicitation techniques developed under the classical probabilistic uncertainty framework. This mismatch leads to systematic failure modes, particularly in settings that involve ambiguous question-answering, in-context learning, and self-reflection. To address this, we propose novel prompt-based uncertainty elicitation techniques grounded in \emph{imprecise probabilities}, a principled framework for repesenting and eliciting higher-order uncertainty. Here, first-order uncertainty captures uncertainty over possible responses to a prompt, while second-order uncertainty (uncertainty about uncertainty) quantifies indeterminacy in the underlying probability model itself. We introduce general-purpose prompting and post-processing procedures to directly elicit and quantify both orders of uncertainty, and demonstrate their effectiveness across diverse settings. Our approach enables more faithful uncertainty reporting from LLMs, improving credibility and supporting downstream decision-making.
Paper Structure (34 sections, 10 equations, 17 figures, 1 table, 2 algorithms)

This paper contains 34 sections, 10 equations, 17 figures, 1 table, 2 algorithms.

Figures (17)

  • Figure 1: Collection of failure modes in prior verbalized uncertainty scores. (a) Example of an ambiguous question. (b) The prior uncertainty score fails to distinguish between clear and ambiguous question distributions. (c) The prior score also fails to track the decrease in prediction error, which should reflect reduced uncertainty as more in-context examples are provided. (d) Self-reflection on answer-wise probability/uncertainty should explain the rationale behind answer selection, but it often fails to do so.
  • Figure 2: Our imprecise probabilities–based approach. (a) Classical precise probability provides point estimates. (b) Imprecise probabilities instead represents uncertainty as intervals. (c) This enables more reasonable elicitation of question ambiguity. (d) It more closely tracks predictive error. (e) It aligns the LLM’s answer selection with the selection implied by its imprecise probability estimates.
  • Figure 5: Learnable vs. noisy transforms.
  • Figure 6: (a) For first-order uncertainty estimation, both vanilla and De Finetti capture the underlying ambiguity noise $p$, but (b) for second-order, our methods stay flat, supporting the disentanglement of uncertainty source.
  • Figure 7: (a) For individual EU elicitation, vanilla incorrectly increases uncertainty, whereas ProbInt-EU tracks prediction error. (b) For group EU elicitation, our Credal-EU substantially improves AUROC.
  • ...and 12 more figures