On the Role of Unobserved Sequences on Sample-based Uncertainty Quantification for LLMs
Lucie Kunitomo-Jacquin, Edison Marrese-Taylor, Ken Fukuda
TL;DR
The paper tackles the challenge of uncertainty quantification for large language models by highlighting that standard entropy-based metrics miss the probability mass of unobserved sequences. It introduces the unobserved-probability concept, $\mathbb{P}(\bar{A}|x)$, and presents two practical variants, EOS-UP and LN-UP, to incorporate missing mass into UQ computed from sampled outputs. Through experiments on Falcon-40B-Instruct with TriviaQA, EOS-UP achieves AUROC performance comparable to predictive entropy and demonstrates robustness when the number of samples $M$ is small, while LN-UP underperforms. The work suggests integrating unobserved probability into existing UQ frameworks, potentially via evidential theories, to more comprehensively capture epistemic and aleatoric uncertainty in LLM outputs.
Abstract
Quantifying uncertainty in large language models (LLMs) is important for safety-critical applications because it helps spot incorrect answers, known as hallucinations. One major trend of uncertainty quantification methods is based on estimating the entropy of the distribution of the LLM's potential output sequences. This estimation is based on a set of output sequences and associated probabilities obtained by querying the LLM several times. In this paper, we advocate and experimentally show that the probability of unobserved sequences plays a crucial role, and we recommend future research to integrate it to enhance such LLM uncertainty quantification methods.
