Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

Anita Yang; Krikamol Muandet; Michele Caprio; Siu Lun Chau; Masaki Adachi

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, Masaki Adachi

TL;DR

This work proposes novel prompt-based uncertainty elicitation techniques grounded in precise probabilities, a principled framework for repesenting and eliciting higher-order uncertainty, and introduces general-purpose prompting and post-processing procedures to directly elicit and quantify both orders of uncertainty.

Abstract

Despite the growing demand for eliciting uncertainty from large language models (LLMs), empirical evidence suggests that LLM behavior is not always adequately captured by the elicitation techniques developed under the classical probabilistic uncertainty framework. This mismatch leads to systematic failure modes, particularly in settings that involve ambiguous question-answering, in-context learning, and self-reflection. To address this, we propose novel prompt-based uncertainty elicitation techniques grounded in \emph{imprecise probabilities}, a principled framework for repesenting and eliciting higher-order uncertainty. Here, first-order uncertainty captures uncertainty over possible responses to a prompt, while second-order uncertainty (uncertainty about uncertainty) quantifies indeterminacy in the underlying probability model itself. We introduce general-purpose prompting and post-processing procedures to directly elicit and quantify both orders of uncertainty, and demonstrate their effectiveness across diverse settings. Our approach enables more faithful uncertainty reporting from LLMs, improving credibility and supporting downstream decision-making.

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

TL;DR

Abstract

Paper Structure (34 sections, 10 equations, 17 figures, 1 table, 2 algorithms)

This paper contains 34 sections, 10 equations, 17 figures, 1 table, 2 algorithms.

Introduction
Background
LLM uncertainty quantification
Imprecise probabilities
Uncertainty Elicitation via Imprecise Probabilities
First-order Uncertainty
Second-order Uncertainty
Maximum Mean Imprecision
Related works
Synthetic Experiment
First-order noise
Second-order denoise
Group uncertainty elicitation
Real-world QA experiment
Ambiguity and correctness
...and 19 more sections

Figures (17)

Figure 1: Collection of failure modes in prior verbalized uncertainty scores. (a) Example of an ambiguous question. (b) The prior uncertainty score fails to distinguish between clear and ambiguous question distributions. (c) The prior score also fails to track the decrease in prediction error, which should reflect reduced uncertainty as more in-context examples are provided. (d) Self-reflection on answer-wise probability/uncertainty should explain the rationale behind answer selection, but it often fails to do so.
Figure 2: Our imprecise probabilities–based approach. (a) Classical precise probability provides point estimates. (b) Imprecise probabilities instead represents uncertainty as intervals. (c) This enables more reasonable elicitation of question ambiguity. (d) It more closely tracks predictive error. (e) It aligns the LLM’s answer selection with the selection implied by its imprecise probability estimates.
Figure 5: Learnable vs. noisy transforms.
Figure 6: (a) For first-order uncertainty estimation, both vanilla and De Finetti capture the underlying ambiguity noise $p$, but (b) for second-order, our methods stay flat, supporting the disentanglement of uncertainty source.
Figure 7: (a) For individual EU elicitation, vanilla incorrectly increases uncertainty, whereas ProbInt-EU tracks prediction error. (b) For group EU elicitation, our Credal-EU substantially improves AUROC.
...and 12 more figures

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

TL;DR

Abstract

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

Authors

TL;DR

Abstract

Table of Contents

Figures (17)