Table of Contents
Fetching ...

Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

Linyu Liu, Yu Pan, Xiaocheng Li, Guanting Chen

TL;DR

The paper formalizes uncertainty estimation for LLMs as a supervised regression problem using labeled response-quality metrics. It proposes a pipeline that extracts features from white-box hidden activations and grey-box probability/entropy signals, enabling three regimes: white-box, grey-box, and black-box uncertainty estimation. Empirical results show the supervised approach generally outperforms unsupervised baselines across QA, translation, and MMLU tasks, with hidden activations providing notably strong signals and transferability to out-of-distribution settings. The work also clarifies the distinction between uncertainty estimation and calibration and demonstrates practical applicability to closed-source LLMs via black-box estimation.

Abstract

In this paper, we study the problem of uncertainty estimation and calibration for LLMs. We begin by formulating the uncertainty estimation problem, a relevant yet underexplored area in existing literature. We then propose a supervised approach that leverages labeled datasets to estimate the uncertainty in LLMs' responses. Based on the formulation, we illustrate the difference between the uncertainty estimation for LLMs and that for standard ML models and explain why the hidden neurons of the LLMs may contain uncertainty information. Our designed approach demonstrates the benefits of utilizing hidden activations to enhance uncertainty estimation across various tasks and shows robust transferability in out-of-distribution settings. We distinguish the uncertainty estimation task from the uncertainty calibration task and show that better uncertainty estimation leads to better calibration performance. Furthermore, our method is easy to implement and adaptable to different levels of model accessibility including black box, grey box, and white box.

Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

TL;DR

The paper formalizes uncertainty estimation for LLMs as a supervised regression problem using labeled response-quality metrics. It proposes a pipeline that extracts features from white-box hidden activations and grey-box probability/entropy signals, enabling three regimes: white-box, grey-box, and black-box uncertainty estimation. Empirical results show the supervised approach generally outperforms unsupervised baselines across QA, translation, and MMLU tasks, with hidden activations providing notably strong signals and transferability to out-of-distribution settings. The work also clarifies the distinction between uncertainty estimation and calibration and demonstrates practical applicability to closed-source LLMs via black-box estimation.

Abstract

In this paper, we study the problem of uncertainty estimation and calibration for LLMs. We begin by formulating the uncertainty estimation problem, a relevant yet underexplored area in existing literature. We then propose a supervised approach that leverages labeled datasets to estimate the uncertainty in LLMs' responses. Based on the formulation, we illustrate the difference between the uncertainty estimation for LLMs and that for standard ML models and explain why the hidden neurons of the LLMs may contain uncertainty information. Our designed approach demonstrates the benefits of utilizing hidden activations to enhance uncertainty estimation across various tasks and shows robust transferability in out-of-distribution settings. We distinguish the uncertainty estimation task from the uncertainty calibration task and show that better uncertainty estimation leads to better calibration performance. Furthermore, our method is easy to implement and adaptable to different levels of model accessibility including black box, grey box, and white box.
Paper Structure (31 sections, 2 theorems, 10 equations, 16 figures, 5 tables, 1 algorithm)

This paper contains 31 sections, 2 theorems, 10 equations, 16 figures, 5 tables, 1 algorithm.

Key Result

Proposition 4.1

Let $\mathcal{F}$ be the class of measurable function that maps from $\mathbb{R}^d$ to $[0,1]$. Under the cross-entropy loss $l(y, \hat{y})= y\log (\hat{y})+(1-y)\log (1-\hat{y})$, the function $f^*$ that minimizes the loss is the Bayes optimal classifier $f^*(\bm{x}) = \mathbb{P}(Y=1|\bm{X}=\bm{x})$ where the expectation and the probability are taken with respect to $(\bm{X},Y)\sim\mathcal{P}.$

Figures (16)

  • Figure 1: An example to illustrate the uncertainty estimation task. The LLM randomly generates an answer to the question (It's Paris, Paris, or London). The goal of the uncertainty estimation is to estimate a confidence score to the question-answer pair, where a higher score indicates a higher confidence in the correctness of the answer.
  • Figure 2: Illustration of our proposed supervised method. The tool LLM is an open-source LLM and can be different from the target LLM. In the training phase, where the reference response is available, we train the uncertainty estimator using the quality of the response as the label. In the test phase, the uncertainty estimator predicts the quality of the generated response to obtain an uncertainty score.
  • Figure 3: Performance comparison of using hidden activations from different tokens and layers as features in the Wb-S method. The bars filled with '/' and '.' represent the activations averaged over the answer tokens and the hidden activation of the last token, respectively. And the green and orange bars denote the activations from the middle and the last layer, respectively.
  • Figure 4: (Left) Using the hidden activations of LLaMA2-7B and LLaMA2-13B to estimate the uncertainty of the answer provided by Gemma-7B. (Middle) Using the hidden activations of Gemma-2B and Gemma-7B to estimate the uncertainty of the answer provided by LLaMA2-7B. (Right) Using the hidden activations of Gemma-2B and Gemma-7B to estimate the uncertainty of the answer provided by LLaMA3-8B
  • Figure 5: The histograms of the pairwise correlations on the TriviaQA task between the neuron activations and the labels (whether the LLM's response is correct), where the neural values are the last-token hidden activations of answers from the middle layer (upper) and the last layer (lower) of two models respectively.
  • ...and 11 more figures

Theorems & Definitions (2)

  • Proposition 4.1
  • Corollary 4.2