Table of Contents
Fetching ...

Eliciting the Priors of Large Language Models using Iterated In-Context Learning

Jian-Qiao Zhu, Thomas L. Griffiths

TL;DR

This work introduces iterated in-context learning as a prompt-based, MCMC-like method to elicit the implicit priors of large language models. By validating on tasks with known human priors (causal strengths, proportions, and everyday quantities) and extending to speculative events, the authors show that GPT-4's priors qualitatively mirror human priors and can outperform simple baselines. The approach suggests LLMs encode human-like probabilistic beliefs and can serve as surrogates to study priors when direct measurement is difficult. The findings have broad implications for understanding model decision-making, shaping how we interpret automated science, and examining the role of LLMs as cultural technologies, while acknowledging methodological and theoretical limitations.

Abstract

As Large Language Models (LLMs) are increasingly deployed in real-world settings, understanding the knowledge they implicitly use when making decisions is critical. One way to capture this knowledge is in the form of Bayesian prior distributions. We develop a prompt-based workflow for eliciting prior distributions from LLMs. Our approach is based on iterated learning, a Markov chain Monte Carlo method in which successive inferences are chained in a way that supports sampling from the prior distribution. We validated our method in settings where iterated learning has previously been used to estimate the priors of human participants -- causal learning, proportion estimation, and predicting everyday quantities. We found that priors elicited from GPT-4 qualitatively align with human priors in these settings. We then used the same method to elicit priors from GPT-4 for a variety of speculative events, such as the timing of the development of superhuman AI.

Eliciting the Priors of Large Language Models using Iterated In-Context Learning

TL;DR

This work introduces iterated in-context learning as a prompt-based, MCMC-like method to elicit the implicit priors of large language models. By validating on tasks with known human priors (causal strengths, proportions, and everyday quantities) and extending to speculative events, the authors show that GPT-4's priors qualitatively mirror human priors and can outperform simple baselines. The approach suggests LLMs encode human-like probabilistic beliefs and can serve as surrogates to study priors when direct measurement is difficult. The findings have broad implications for understanding model decision-making, shaping how we interpret automated science, and examining the role of LLMs as cultural technologies, while acknowledging methodological and theoretical limitations.

Abstract

As Large Language Models (LLMs) are increasingly deployed in real-world settings, understanding the knowledge they implicitly use when making decisions is critical. One way to capture this knowledge is in the form of Bayesian prior distributions. We develop a prompt-based workflow for eliciting prior distributions from LLMs. Our approach is based on iterated learning, a Markov chain Monte Carlo method in which successive inferences are chained in a way that supports sampling from the prior distribution. We validated our method in settings where iterated learning has previously been used to estimate the priors of human participants -- causal learning, proportion estimation, and predicting everyday quantities. We found that priors elicited from GPT-4 qualitatively align with human priors in these settings. We then used the same method to elicit priors from GPT-4 for a variety of speculative events, such as the timing of the development of superhuman AI.
Paper Structure (12 sections, 5 equations, 10 figures, 3 tables)

This paper contains 12 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Illustration of an iterated in-context learning procedure to elicit the implicit prior of an LLM regarding male lifespan. At each iteration, the LLM is given the current age of a random man and is prompted to predict the individual’s remaining lifespan. This predicted lifespan is then used to generate a new current age for the next iteration. The new age is a random sample from the uniform distribution between 1 and the predicted lifespan. This implements a Markov chain Monte Carlo algorithm for sampling from the prior $p(h)$.
  • Figure 2: Priors on causal strengths.(a) The causal graphical model. (b) Smoothed empirical estimates of human (left) and GPT-4 (right) priors on causal strength produced by iterated learning for generative cases. (c) Smoothed empirical estimates of human (left) and GPT-4 (right) priors on causal strength produced by iterated learning for preventive cases. Human data in panel (b) and (c) were adapted from yeung2015identifying.
  • Figure 3: Comparison of causal generative priors using alternative cover stories between humans and GPT-4. Human data adapted from yeung2015identifying. Detailed prompts are provided in Appendix \ref{['ap:prompts']}.
  • Figure 4: Priors on proportion estimation.(a) The empirical distribution of probability-describing phrases from the British National Corpus. Figure adapted from zhu2020bayesian. (b) Example iterated learning chains for human participants estimating the proportion of binary events. Figure adapted from reali2009evolution. (c) The evolution of GPT-4’s estimation of binary events using iterated learning. (d) The histogram of GPT-4’s proportion estimation in the final (12th) iteration. (e) Example iterated learning chains for GPT-4 estimating the proportion of binary events, for comparison with human data.
  • Figure 5: Priors on everyday quantities. Each panel displays the elicited prior using iterated learning on human participants (left), the histogram of GPT-4’s final iteration of predictions (middle), and the evolution of GPT-4’s predictions across iterated learning iterations (right). Human data adapted from lewandowsky2009wisdom.
  • ...and 5 more figures