Table of Contents
Fetching ...

JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs

Aliaksandra Shysheya, John Bronskill, James Requeima, Shoaib Ahmed Siddiqui, Javier Gonzalez, David Duvenaud, Richard E. Turner

TL;DR

JoLT introduces Joint LLMP for Tabular data, a simple, prompt-based framework that uses in-context learning with LLMs to generate joint probabilistic predictions over heterogeneous tabular outputs without training or preprocessing. It supports missing data implicitly, leverages textual side information to refine predictions, and offers both sample-based and full-distribution inference via LLM logits, enabling uncertainty quantification. Across classification and multi-target tasks in low-shot settings, JoLT often surpasses strong baselines, particularly when side information is available, and demonstrates competitive imputation capabilities. The work highlights practical advantages for real-world prediction problems, while acknowledging context-size and computational constraints, and outlines directions for scaling with larger models and richer side information.

Abstract

We introduce a simple method for probabilistic predictions on tabular data based on Large Language Models (LLMs) called JoLT (Joint LLM Process for Tabular data). JoLT uses the in-context learning capabilities of LLMs to define joint distributions over tabular data conditioned on user-specified side information about the problem, exploiting the vast repository of latent problem-relevant knowledge encoded in LLMs. JoLT defines joint distributions for multiple target variables with potentially heterogeneous data types without any data conversion, data preprocessing, special handling of missing data, or model training, making it accessible and efficient for practitioners. Our experiments show that JoLT outperforms competitive methods on low-shot single-target and multi-target tabular classification and regression tasks. Furthermore, we show that JoLT can automatically handle missing data and perform data imputation by leveraging textual side information. We argue that due to its simplicity and generality, JoLT is an effective approach for a wide variety of real prediction problems.

JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs

TL;DR

JoLT introduces Joint LLMP for Tabular data, a simple, prompt-based framework that uses in-context learning with LLMs to generate joint probabilistic predictions over heterogeneous tabular outputs without training or preprocessing. It supports missing data implicitly, leverages textual side information to refine predictions, and offers both sample-based and full-distribution inference via LLM logits, enabling uncertainty quantification. Across classification and multi-target tasks in low-shot settings, JoLT often surpasses strong baselines, particularly when side information is available, and demonstrates competitive imputation capabilities. The work highlights practical advantages for real-world prediction problems, while acknowledging context-size and computational constraints, and outlines directions for scaling with larger models and richer side information.

Abstract

We introduce a simple method for probabilistic predictions on tabular data based on Large Language Models (LLMs) called JoLT (Joint LLM Process for Tabular data). JoLT uses the in-context learning capabilities of LLMs to define joint distributions over tabular data conditioned on user-specified side information about the problem, exploiting the vast repository of latent problem-relevant knowledge encoded in LLMs. JoLT defines joint distributions for multiple target variables with potentially heterogeneous data types without any data conversion, data preprocessing, special handling of missing data, or model training, making it accessible and efficient for practitioners. Our experiments show that JoLT outperforms competitive methods on low-shot single-target and multi-target tabular classification and regression tasks. Furthermore, we show that JoLT can automatically handle missing data and perform data imputation by leveraging textual side information. We argue that due to its simplicity and generality, JoLT is an effective approach for a wide variety of real prediction problems.

Paper Structure

This paper contains 28 sections, 2 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Mapping tabular data to a prompt $P$. The diagram on the left depicts a tabular dataset where the first $N$ rows are training examples with $F$ features and $T$ targets fully observed. The last row represents a test example with $F$ observed features and unobserved targets. The prompt at the top right is formed by serializing the $N$ training examples and the test features into a single string. The prompt is input to a pretrained LLM that will generate the targets to complete the test example. See \ref{['tab:nomenclature']} for nomenclature.
  • Figure 2: Missing data handling. The diagram depicts a tabular dataset with 4 rows of training examples. The last row represents a test example with 2 unobserved targets. Empty cells represent 40$\%$ (8 out of 20 feature cells) missing completely-at-random data. Note that both training and test examples are affected by missing data. The prompt is formed by simply omitting missing cells.
  • Figure 3: JoLT imputation. To impute values for a specific row (highlighted in pink), the features are reordered such that the features with existing values for the specific row are positioned first. The prompt for the LLM is constructed akin to \ref{['fig:missing_data_handling']}, where instead of predicting targets $Y_i$, the model predicts the missing values for a specific row (e.g., columns $X_1$ and $X_4$ in the diagram).
  • Figure 4: Area Under the Receiver Operating Characteristic Curve (AUC) as a function of shot for JoLT using two different LLMs and three competitive methods. The solid line and dots indicate the mean over 5 seeds which affect the training shot selection and the shaded region shows a confidence interval of one $\sigma$. Competitive data from hegselmann2023tabllm. Tabular results are in \ref{['tab:classification_results']}.
  • Figure 5: Results for predicting two target columns from the Wine Quality dataset wine_quality_186 as a function of shots when evaluating on 1000 test examples. The first target column is numerical (Alcohol $\%$) using the metric Mean Absolute Error (MAE) and the second target column is categorical (Quality on a scale of 1 to 10) using classification accuracy as the metric. The joint NLL is over both targets. The JoLT methods use the Gemma-2-27B LLM. JoLT (Text) utilized both prefix text $\langle prefix \rangle$ and text from the column headers $X_j, Y_j$, whereas JoLT (No Text) did not. The solid line and dots indicate the mean over 5 seeds which affect the training shot and test example selection and the shaded region shows a confidence interval of one $\sigma$. Tabular results are in \ref{['tab:multi_column_results']}.
  • ...and 5 more figures