Table of Contents
Fetching ...

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

Deepro Choudhury, Sinead Williamson, Adam Goliński, Ning Miao, Freddie Bickford Smith, Michael Kirchhof, Yizhe Zhang, Tom Rainforth

TL;DR

BED-LLM reframes interactive information gathering with LLMs as a sequential Bayesian experimental design problem, deriving a joint model from the LLM and using information gain to select queries. It advocates a prior–likelihood pairing with belief filtering ($p_f(\theta; h_{t-1})$) over a purely in-context update, and employs a Rao–Blackwellized estimator to compute EIG for candidate questions. Across 20 Questions and preference elicitation tasks, BED-LLM demonstrates substantial gains over naive prompting and simpler baselines, and shows robustness to questioner–answerer model mismatch. The work provides a principled, scalable blueprint for turning LLMs into adaptive information-gathering agents with practical impact in tasks like preference elicitation and interactive surveys.

Abstract

We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian Experimental Design with Large Language Models), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) about the task of interest given the responses gathered previously. We show how this EIG can be formulated (and then estimated) in a principled way using a probabilistic model derived from the LLM's predictive distributions and provide detailed insights into key decisions in its construction and updating procedure. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20 questions game and using the LLM to actively infer user preferences, compared to direct prompting of the LLM and other adaptive design strategies.

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

TL;DR

BED-LLM reframes interactive information gathering with LLMs as a sequential Bayesian experimental design problem, deriving a joint model from the LLM and using information gain to select queries. It advocates a prior–likelihood pairing with belief filtering () over a purely in-context update, and employs a Rao–Blackwellized estimator to compute EIG for candidate questions. Across 20 Questions and preference elicitation tasks, BED-LLM demonstrates substantial gains over naive prompting and simpler baselines, and shows robustness to questioner–answerer model mismatch. The work provides a principled, scalable blueprint for turning LLMs into adaptive information-gathering agents with practical impact in tasks like preference elicitation and interactive surveys.

Abstract

We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian Experimental Design with Large Language Models), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) about the task of interest given the responses gathered previously. We show how this EIG can be formulated (and then estimated) in a principled way using a probabilistic model derived from the LLM's predictive distributions and provide detailed insights into key decisions in its construction and updating procedure. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20 questions game and using the LLM to actively infer user preferences, compared to direct prompting of the LLM and other adaptive design strategies.

Paper Structure

This paper contains 57 sections, 7 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Success rate on 20 Questions: mean $\pm$ standard error across 100 targets per dataset.
  • Figure 2: Mean rating across 10 film recommendations: mean $\pm$ standard error across 200 users.
  • Figure 3: Predictive entropy vs. expected information gain (EIG) in a film‐preferences elicitation task. Left: very high predictive entropy (answer is completely unknown) but EIG $= 0$ because the answer provides no insight into the user's film preferences. Right: both predictive entropy and EIG are high as the answer is uncertain, but different answers would lead to markedly different posterior updates, making it informative for learning film preferences. This thus demonstrates how the two criteria can select different questions.
  • Figure 4: An example of the sample distribution generated using the prompt in \ref{['fig:theta_prompt']}, conditioned on the two question/answer pairs at the top of this figure. At this stage of the game, we independently sample 200 hypotheses from the LLM and record their frequencies. Note that the distribution exhibits strong mode collapse, with most of the mass highly concentrated on just a few answers, which negatively impacts the performance of the data--estimation method. This summary is for diagnostic purposes: \ref{['alg:sequential-eig']} operates on the probabilities of individual samples and never instantiates such an aggregated summary.
  • Figure 5: Animals problem set (generated using OpenAI o3, with manual curation)
  • ...and 1 more figures