Eliciting the Priors of Large Language Models using Iterated In-Context Learning
Jian-Qiao Zhu, Thomas L. Griffiths
TL;DR
This work introduces iterated in-context learning as a prompt-based, MCMC-like method to elicit the implicit priors of large language models. By validating on tasks with known human priors (causal strengths, proportions, and everyday quantities) and extending to speculative events, the authors show that GPT-4's priors qualitatively mirror human priors and can outperform simple baselines. The approach suggests LLMs encode human-like probabilistic beliefs and can serve as surrogates to study priors when direct measurement is difficult. The findings have broad implications for understanding model decision-making, shaping how we interpret automated science, and examining the role of LLMs as cultural technologies, while acknowledging methodological and theoretical limitations.
Abstract
As Large Language Models (LLMs) are increasingly deployed in real-world settings, understanding the knowledge they implicitly use when making decisions is critical. One way to capture this knowledge is in the form of Bayesian prior distributions. We develop a prompt-based workflow for eliciting prior distributions from LLMs. Our approach is based on iterated learning, a Markov chain Monte Carlo method in which successive inferences are chained in a way that supports sampling from the prior distribution. We validated our method in settings where iterated learning has previously been used to estimate the priors of human participants -- causal learning, proportion estimation, and predicting everyday quantities. We found that priors elicited from GPT-4 qualitatively align with human priors in these settings. We then used the same method to elicit priors from GPT-4 for a variety of speculative events, such as the timing of the development of superhuman AI.
