Table of Contents
Fetching ...

Can Linear Probes Measure LLM Uncertainty?

Ramzi Dakhmouche, Adrien Letellier, Hossein Gorji

TL;DR

The paper tackles the challenge of reliable uncertainty quantification for large language models in discrete-choice tasks, where the standard maximum softmax score is insufficient. It introduces Bayesian Linear Lens (BLL), a lightweight framework that learns layer-wise Bayesian linear models to approximate activations conditioned on truthfulness and then combines these layer-level posteriors through sparse regression to obtain a global uncertainty score. The authors compare two feature designs—posterior log-likelihoods and log-likelihood ratios—and demonstrate that the approach yields consistent AUROC gains over MSP baselines across multiple LLMs on the MMLU dataset, with additional gains for open-ended questions on TriviaQA. The work highlights that principled Bayesian analysis of simple linear probes can achieve strong uncertainty quantification without heavy ensembles, supporting scalable, interpretable UQ for safe AI deployment.

Abstract

Effective Uncertainty Quantification (UQ) represents a key aspect for reliable deployment of Large Language Models (LLMs) in automated decision-making and beyond. Yet, for LLM generation with multiple choice structure, the state-of-the-art in UQ is still dominated by the naive baseline given by the maximum softmax score. To address this shortcoming, we demonstrate that taking a principled approach via Bayesian statistics leads to improved performance despite leveraging the simplest possible model, namely linear regression. More precisely, we propose to train multiple Bayesian linear models, each predicting the output of a layer given the output of the previous one. Based on the obtained layer-level posterior distributions, we infer the global uncertainty level of the LLM by identifying a sparse combination of distributional features, leading to an efficient UQ scheme. Numerical experiments on various LLMs show consistent improvement over state-of-the-art baselines.

Can Linear Probes Measure LLM Uncertainty?

TL;DR

The paper tackles the challenge of reliable uncertainty quantification for large language models in discrete-choice tasks, where the standard maximum softmax score is insufficient. It introduces Bayesian Linear Lens (BLL), a lightweight framework that learns layer-wise Bayesian linear models to approximate activations conditioned on truthfulness and then combines these layer-level posteriors through sparse regression to obtain a global uncertainty score. The authors compare two feature designs—posterior log-likelihoods and log-likelihood ratios—and demonstrate that the approach yields consistent AUROC gains over MSP baselines across multiple LLMs on the MMLU dataset, with additional gains for open-ended questions on TriviaQA. The work highlights that principled Bayesian analysis of simple linear probes can achieve strong uncertainty quantification without heavy ensembles, supporting scalable, interpretable UQ for safe AI deployment.

Abstract

Effective Uncertainty Quantification (UQ) represents a key aspect for reliable deployment of Large Language Models (LLMs) in automated decision-making and beyond. Yet, for LLM generation with multiple choice structure, the state-of-the-art in UQ is still dominated by the naive baseline given by the maximum softmax score. To address this shortcoming, we demonstrate that taking a principled approach via Bayesian statistics leads to improved performance despite leveraging the simplest possible model, namely linear regression. More precisely, we propose to train multiple Bayesian linear models, each predicting the output of a layer given the output of the previous one. Based on the obtained layer-level posterior distributions, we infer the global uncertainty level of the LLM by identifying a sparse combination of distributional features, leading to an efficient UQ scheme. Numerical experiments on various LLMs show consistent improvement over state-of-the-art baselines.

Paper Structure

This paper contains 19 sections, 6 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: UQ Decomposition in Learning Systems
  • Figure 2: Layers of the activated neurons, Llama-3.1-8B-Instruct, raw neurons, in the answer log-likelihood ratio setting.
  • Figure 3: Layers of the activated neurons, Llama-3.1-8B-Instruct, density, in the answer log-likelihood ratio setting.
  • Figure 4: Layers of the activated neurons, Llama-3.1-8B-Instruct, truncated regression, in the answer log-likelihood ratio setting.
  • Figure 5: Layers of the activated neurons, Llama-3.1-8B-Instruct, truncated ridge, in the answer log-likelihood ratio setting.
  • ...and 4 more figures