Had enough of experts? Quantitative knowledge retrieval from large language models

David Selby; Kai Spriestersbach; Yuichiro Iwashita; Mohammad Saad; Dennis Bappert; Archana Warrier; Sumantrak Mukherjee; Koichi Kise; Sebastian Vollmer

Had enough of experts? Quantitative knowledge retrieval from large language models

David Selby, Kai Spriestersbach, Yuichiro Iwashita, Mohammad Saad, Dennis Bappert, Archana Warrier, Sumantrak Mukherjee, Koichi Kise, Sebastian Vollmer

TL;DR

The paper investigates whether large language models can serve as quantitative knowledge sources by eliciting expert-like priors and performing zero-shot missing-data imputation within Bayesian workflows. It introduces a prompting-and-serialization framework to extract parametric priors and imputations, and evaluates informativeness, calibration, and imputation quality across diverse real-world datasets, including OpenML-CC18. Findings show substantial model-to-model variation and limited upstream imputation success for LLMs, with downstream tasks sometimes benefiting from imputed data, but data leakage and domain-dependence pose significant challenges. The work demonstrates feasibility and potential of LLM-driven quantitative knowledge retrieval while highlighting practical limitations and directions for refinement, such as domain-specific tuning, hybrid methods, and robust benchmarking with curated data.

Abstract

Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less well understood. Here we explore the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid two data analysis tasks: elicitation of prior distributions for Bayesian models and imputation of missing data. We introduce a framework that leverages LLMs to enhance Bayesian workflows by eliciting expert-like prior knowledge and imputing missing data. Tested on diverse datasets, this approach can improve predictive accuracy and reduce data requirements, offering significant potential in healthcare, environmental science and engineering applications. We discuss the implications and challenges of treating LLMs as 'experts'.

Had enough of experts? Quantitative knowledge retrieval from large language models

TL;DR

Abstract

Paper Structure (20 sections, 2 equations, 7 figures, 1 table, 4 algorithms)

This paper contains 20 sections, 2 equations, 7 figures, 1 table, 4 algorithms.

Introduction
Related work
Methods
Overview
Eliciting priors and imputed values from LLMs
Evaluating expert priors
Prior elicitation experiments
Evaluating missing data imputation
Imputation experiments
Results
Prior elicitation
Missing value imputation
Investigating data leakage
Conclusion and further lines of research
Prompting for prior elicitation
...and 5 more sections

Figures (7)

Figure 1: Priors for Cohen's $\delta$ (top) and Pearson correlations (bottom) elicited from LLM and human experts in psychology. Dashed lines denote a Shelf-like elicitation protocol
Figure 2: Benefit of LLM priors for weather forecasting: number of observations needed for a frequentist model to achieve better MSE than the prior predictive distribution
Figure 3: Distribution of prior effective sample size ($\alpha+\beta$) for beta priors on various tasks. Outliers are omitted
Figure 4: Upstream RMSE performance across LLMs, KNN and random forest
Figure 5: Downstream macro-$F_1$ change of LLMs
...and 2 more figures

Had enough of experts? Quantitative knowledge retrieval from large language models

TL;DR

Abstract

Had enough of experts? Quantitative knowledge retrieval from large language models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)