On the Role of Unobserved Sequences on Sample-based Uncertainty Quantification for LLMs

Lucie Kunitomo-Jacquin; Edison Marrese-Taylor; Ken Fukuda

On the Role of Unobserved Sequences on Sample-based Uncertainty Quantification for LLMs

Lucie Kunitomo-Jacquin, Edison Marrese-Taylor, Ken Fukuda

TL;DR

The paper tackles the challenge of uncertainty quantification for large language models by highlighting that standard entropy-based metrics miss the probability mass of unobserved sequences. It introduces the unobserved-probability concept, $\mathbb{P}(\bar{A}|x)$, and presents two practical variants, EOS-UP and LN-UP, to incorporate missing mass into UQ computed from sampled outputs. Through experiments on Falcon-40B-Instruct with TriviaQA, EOS-UP achieves AUROC performance comparable to predictive entropy and demonstrates robustness when the number of samples $M$ is small, while LN-UP underperforms. The work suggests integrating unobserved probability into existing UQ frameworks, potentially via evidential theories, to more comprehensively capture epistemic and aleatoric uncertainty in LLM outputs.

Abstract

Quantifying uncertainty in large language models (LLMs) is important for safety-critical applications because it helps spot incorrect answers, known as hallucinations. One major trend of uncertainty quantification methods is based on estimating the entropy of the distribution of the LLM's potential output sequences. This estimation is based on a set of output sequences and associated probabilities obtained by querying the LLM several times. In this paper, we advocate and experimentally show that the probability of unobserved sequences plays a crucial role, and we recommend future research to integrate it to enhance such LLM uncertainty quantification methods.

On the Role of Unobserved Sequences on Sample-based Uncertainty Quantification for LLMs

TL;DR

Abstract

On the Role of Unobserved Sequences on Sample-based Uncertainty Quantification for LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)