Table of Contents
Fetching ...

Eliciting Numerical Predictive Distributions of LLMs Without Autoregression

Julianna Piskorz, Katarzyna Kobalczyk, Mihaela van der Schaar

TL;DR

This investigation investigates whether distributional properties of LLM predictions can be recovered without explicit autoregressive generation, and suggests that LLM embeddings carry informative signals about summary statistics of their predictive distributions, including the numerical uncertainty.

Abstract

Large Language Models (LLMs) have recently been successfully applied to regression tasks -- such as time series forecasting and tabular prediction -- by leveraging their in-context learning abilities. However, their autoregressive decoding process may be ill-suited to continuous-valued outputs, where obtaining predictive distributions over numerical targets requires repeated sampling, leading to high computational cost and inference time. In this work, we investigate whether distributional properties of LLM predictions can be recovered without explicit autoregressive generation. To this end, we study a set of regression probes trained to predict statistical functionals (e.g., mean, median, quantiles) of the LLM's numerical output distribution directly from its internal representations. Our results suggest that LLM embeddings carry informative signals about summary statistics of their predictive distributions, including the numerical uncertainty. This investigation opens up new questions about how LLMs internally encode uncertainty in numerical tasks, and about the feasibility of lightweight alternatives to sampling-based approaches for uncertainty-aware numerical predictions.

Eliciting Numerical Predictive Distributions of LLMs Without Autoregression

TL;DR

This investigation investigates whether distributional properties of LLM predictions can be recovered without explicit autoregressive generation, and suggests that LLM embeddings carry informative signals about summary statistics of their predictive distributions, including the numerical uncertainty.

Abstract

Large Language Models (LLMs) have recently been successfully applied to regression tasks -- such as time series forecasting and tabular prediction -- by leveraging their in-context learning abilities. However, their autoregressive decoding process may be ill-suited to continuous-valued outputs, where obtaining predictive distributions over numerical targets requires repeated sampling, leading to high computational cost and inference time. In this work, we investigate whether distributional properties of LLM predictions can be recovered without explicit autoregressive generation. To this end, we study a set of regression probes trained to predict statistical functionals (e.g., mean, median, quantiles) of the LLM's numerical output distribution directly from its internal representations. Our results suggest that LLM embeddings carry informative signals about summary statistics of their predictive distributions, including the numerical uncertainty. This investigation opens up new questions about how LLMs internally encode uncertainty in numerical tasks, and about the feasibility of lightweight alternatives to sampling-based approaches for uncertainty-aware numerical predictions.
Paper Structure (54 sections, 7 equations, 12 figures, 13 tables)

This paper contains 54 sections, 7 equations, 12 figures, 13 tables.

Figures (12)

  • Figure 1: Illustration of this paper's goals and methodology.
  • Figure 2: Predicted vs. true values of mean, median and greedy prediction, presented on $\log_{10}$ scale. The probing model accurately recovers the number that the LLM intends to predict, indicating that the internal representations encode the order of magnitude of prediction.
  • Figure 3: Predicted vs. sample-based IQR (both median-normalised). The model accurately tracks the variability of the LLM's output distribution.
  • Figure 4: The probe achieves comparable error to using 20-25 LLM samples on the one step ahead prediction task.
  • Figure 5: Generalisation to unseen context lengths. A probe trained on a restricted context length range (Restricted) exhibits greater deviation in empirical coverage outside its training range.
  • ...and 7 more figures