LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

James Requeima; John Bronskill; Dami Choi; Richard E. Turner; David Duvenaud

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

James Requeima, John Bronskill, Dami Choi, Richard E. Turner, David Duvenaud

TL;DR

The paper addresses the challenge of incorporating expert priors expressed in natural language into probabilistic regression by introducing LLM Processes (LLMPs), which yield joint numerical predictive distributions from large language models conditioned on data and text. It defines two joint-distribution schemes (I-LLMP and A-LLMP), proposes a logit-based, bin-discretized density elicitation, and demonstrates how LLMPs can model multi-dimensional targets with well-calibrated uncertainty. Through extensive experiments across synthetic functions, images, and real-world tasks, LLMPs show competitive performance with Gaussian Processes and can leverage textual information to inject problem structure and priors. The work highlights the potential and limitations of text-conditioned probabilistic regression, offering a pathway for non-experts to access flexible, zero-shot probabilistic modeling in diverse domains. The findings suggest practical impact in forecasting, optimization, and multi-modal prediction, while underscoring computational costs and ethical considerations around deployment and biases.

Abstract

Machine learning practitioners often face significant challenges in formally integrating their prior knowledge and beliefs into predictive models, limiting the potential for nuanced and context-aware analyses. Moreover, the expertise needed to integrate this prior knowledge into probabilistic modeling typically limits the application of these models to specialists. Our goal is to build a regression model that can process numerical data and make probabilistic predictions at arbitrary locations, guided by natural language text which describes a user's prior knowledge. Large Language Models (LLMs) provide a useful starting point for designing such a tool since they 1) provide an interface where users can incorporate expert insights in natural language and 2) provide an opportunity for leveraging latent problem-relevant knowledge encoded in LLMs that users may not have themselves. We start by exploring strategies for eliciting explicit, coherent numerical predictive distributions from LLMs. We examine these joint predictive distributions, which we call LLM Processes, over arbitrarily-many quantities in settings such as forecasting, multi-dimensional regression, black-box optimization, and image modeling. We investigate the practical details of prompting to elicit coherent predictive distributions, and demonstrate their effectiveness at regression. Finally, we demonstrate the ability to usefully incorporate text into numerical predictions, improving predictive performance and giving quantitative structure that reflects qualitative descriptions. This lets us begin to explore the rich, grounded hypothesis space that LLMs implicitly encode.

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

TL;DR

Abstract

Paper Structure (39 sections, 4 equations, 44 figures, 9 tables, 3 algorithms)

This paper contains 39 sections, 4 equations, 44 figures, 9 tables, 3 algorithms.

Introduction
LLM Processes: Defining a Stochastic Process That Can Condition on Text
LLMP Configuration
Evaluating LLMP Performance on Numerical Data
Conditioning LLMPs on Textual Information
Related Work
Discussion, Limitations, and Societal Impact
LLM Processes: Defining a Stochastic Process That Can Condition on Text
Continuous Marginal Likelihoods From an LLM
The LLM Process Method
Defining an LLM Process
LLM Processes Pseudocode
Sample Prompts
Dataset Details
Function Dataset
...and 24 more sections

Figures (44)

Figure 1: Predictive distributions from an LLMP conditioned on both data and text information. The tenth-percentiles from 50 samples are visualized in faded blue and the median is presented in dark blue with five random samples shown in various colours.
Figure 2: Sampling from an LLM using either independent marginal or autoregressive sampling.
Figure 3: NLL and MAE for various prompt formats ordered from the most to least token efficient (left), training data orderings (middle), and prompt $y$-scaling (right) using the Mixtral-8$\times$7B LLM. The height of each bar is the mean of 10 random seeds that determine the training point locations. The vertical black lines indicate the standard error. In the Prompt Formatting legend (left), the two '_' characters indicate the positions of the $x$ and $y$ values and \\ n represents a new line terminal token.
Figure 4: Autoregressive Experiments. Left: NLL and MAE for A-LLMP and I-LLMP using different prompt orderings using the Mixtral-8x7B LLM. The height of each bar is the mean of 3 random seeds that determine the training point locations. The black lines indicate the standard error. Center: Log-likelihood results of using various test set orderings with Llama-2-7B, Llama-2-70B and Mixtral-8x7B A-LLMP. The orange X indicates I-LLMP, the purple circles used distance ordered test points, and the blue whiskers are the mean and standard error of 10 randomly sampled test orderings. The red dashed line shows the log-likelihood of the test set under the generative process. Right: Heatmap visualization of the Llama-3-70B A-LLMP predictive distribution conditioned on data from a bimodal generative process. Black dots are training points.
Figure 5: Comparison of A-LLMP and LLMTime on the weather dataset. Left: Plot using all 50 training points. Right: Plot of MAE and NLL versus the amount of training data removed. A-LLMP has lower MAE and NLL and the margin over LLMTime increases as more training data is removed.
...and 39 more figures

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

TL;DR

Abstract

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

Authors

TL;DR

Abstract

Table of Contents

Figures (44)