Table of Contents
Fetching ...

Had enough of experts? Quantitative knowledge retrieval from large language models

David Selby, Kai Spriestersbach, Yuichiro Iwashita, Mohammad Saad, Dennis Bappert, Archana Warrier, Sumantrak Mukherjee, Koichi Kise, Sebastian Vollmer

TL;DR

The paper investigates whether large language models can serve as quantitative knowledge sources by eliciting expert-like priors and performing zero-shot missing-data imputation within Bayesian workflows. It introduces a prompting-and-serialization framework to extract parametric priors and imputations, and evaluates informativeness, calibration, and imputation quality across diverse real-world datasets, including OpenML-CC18. Findings show substantial model-to-model variation and limited upstream imputation success for LLMs, with downstream tasks sometimes benefiting from imputed data, but data leakage and domain-dependence pose significant challenges. The work demonstrates feasibility and potential of LLM-driven quantitative knowledge retrieval while highlighting practical limitations and directions for refinement, such as domain-specific tuning, hybrid methods, and robust benchmarking with curated data.

Abstract

Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less well understood. Here we explore the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid two data analysis tasks: elicitation of prior distributions for Bayesian models and imputation of missing data. We introduce a framework that leverages LLMs to enhance Bayesian workflows by eliciting expert-like prior knowledge and imputing missing data. Tested on diverse datasets, this approach can improve predictive accuracy and reduce data requirements, offering significant potential in healthcare, environmental science and engineering applications. We discuss the implications and challenges of treating LLMs as 'experts'.

Had enough of experts? Quantitative knowledge retrieval from large language models

TL;DR

The paper investigates whether large language models can serve as quantitative knowledge sources by eliciting expert-like priors and performing zero-shot missing-data imputation within Bayesian workflows. It introduces a prompting-and-serialization framework to extract parametric priors and imputations, and evaluates informativeness, calibration, and imputation quality across diverse real-world datasets, including OpenML-CC18. Findings show substantial model-to-model variation and limited upstream imputation success for LLMs, with downstream tasks sometimes benefiting from imputed data, but data leakage and domain-dependence pose significant challenges. The work demonstrates feasibility and potential of LLM-driven quantitative knowledge retrieval while highlighting practical limitations and directions for refinement, such as domain-specific tuning, hybrid methods, and robust benchmarking with curated data.

Abstract

Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less well understood. Here we explore the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid two data analysis tasks: elicitation of prior distributions for Bayesian models and imputation of missing data. We introduce a framework that leverages LLMs to enhance Bayesian workflows by eliciting expert-like prior knowledge and imputing missing data. Tested on diverse datasets, this approach can improve predictive accuracy and reduce data requirements, offering significant potential in healthcare, environmental science and engineering applications. We discuss the implications and challenges of treating LLMs as 'experts'.
Paper Structure (20 sections, 2 equations, 7 figures, 1 table, 4 algorithms)

This paper contains 20 sections, 2 equations, 7 figures, 1 table, 4 algorithms.

Figures (7)

  • Figure 1: Priors for Cohen's $\delta$ (top) and Pearson correlations (bottom) elicited from LLM and human experts in psychology. Dashed lines denote a Shelf-like elicitation protocol
  • Figure 2: Benefit of LLM priors for weather forecasting: number of observations needed for a frequentist model to achieve better MSE than the prior predictive distribution
  • Figure 3: Distribution of prior effective sample size ($\alpha+\beta$) for beta priors on various tasks. Outliers are omitted
  • Figure 4: Upstream RMSE performance across LLMs, KNN and random forest
  • Figure 5: Downstream macro-$F_1$ change of LLMs
  • ...and 2 more figures