Table of Contents
Fetching ...

AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling

Alexander Capstick, Rahul G. Krishnan, Payam Barnaghi

TL;DR

It is found that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning.

Abstract

Large language models (LLMs) acquire a breadth of information across various domains. However, their computational complexity, cost, and lack of transparency often hinder their direct application for predictive tasks where privacy and interpretability are paramount. In fields such as healthcare, biology, and finance, specialised and interpretable linear models still hold considerable value. In such domains, labelled data may be scarce or expensive to obtain. Well-specified prior distributions over model parameters can reduce the sample complexity of learning through Bayesian inference; however, eliciting expert priors can be time-consuming. We therefore introduce AutoElicit to extract knowledge from LLMs and construct priors for predictive models. We show these priors are informative and can be refined using natural language. We perform a careful study contrasting AutoElicit with in-context learning and demonstrate how to perform model selection between the two methods. We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning. We show that AutoElicit saves over 6 months of labelling effort when building a new predictive model for urinary tract infections from sensor recordings of people living with dementia.

AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling

TL;DR

It is found that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning.

Abstract

Large language models (LLMs) acquire a breadth of information across various domains. However, their computational complexity, cost, and lack of transparency often hinder their direct application for predictive tasks where privacy and interpretability are paramount. In fields such as healthcare, biology, and finance, specialised and interpretable linear models still hold considerable value. In such domains, labelled data may be scarce or expensive to obtain. Well-specified prior distributions over model parameters can reduce the sample complexity of learning through Bayesian inference; however, eliciting expert priors can be time-consuming. We therefore introduce AutoElicit to extract knowledge from LLMs and construct priors for predictive models. We show these priors are informative and can be refined using natural language. We perform a careful study contrasting AutoElicit with in-context learning and demonstrate how to perform model selection between the two methods. We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning. We show that AutoElicit saves over 6 months of labelling effort when building a new predictive model for urinary tract infections from sensor recordings of people living with dementia.

Paper Structure

This paper contains 54 sections, 14 equations, 21 figures, 4 tables.

Figures (21)

  • Figure 1: AutoElicit: Prior elicitation using a language model. The left-hand figure demonstrates the process, whilst the right-hand figure illustrates the benefits of using AutoElicit, achieving the same peak accuracy after $n \approx 15$ labels, 220 days earlier in the study.
  • Figure 2: Test performance of the posterior distribution for varied training data sizes. The average and 95% confidence interval of the mean posterior accuracy or Mean Squared Error (MSE) of $10$ splits of the dataset using AutoElicit priors and an uninformative prior ($\theta \sim \mathcal{N}(\theta | 0,1)$). These are calculated on a test set of each dataset, with the green arrow pointing in the direction of metric improvement.
  • Figure 3: UTI prior parameters with expert information. The histograms in the three left-hand plots show the distribution of parameter values for the features that we provide expert information for in the task description. In all three cases, we state that the features are positively correlated with UTI risk. In the right-hand plot, we show the posterior accuracy of these distributions for different numbers of observed samples. The average and 95% confidence interval of the mean posterior accuracy over the $10$ test splits are shown.
  • Figure 4: Extraction of GPT-3.5-turbo's in-context prior. The bottom row shows the MSE between the LLM's in-context predictions and the MLE model's predictions, whilst the top row shows the parameter distribution of these MLE models, with each colour representing a single feature. For ease of visualisation, we exclude values outside of the $(2.5\%, ~97.5\%)$ percentiles and the bias term.
  • Figure 5: Extraction of GPT-3.5-turbo's in-context posterior. The bottom row shows the MSE between the LLM's in-context predictions and the MLE model's predictions, whilst the top row shows the parameter distribution of these MLE models, with each colour representing a single feature. For ease of visualisation, we exclude values outside of the $(2.5\%, ~97.5\%)$ percentiles and the bias term.
  • ...and 16 more figures