Bayesian Preference Elicitation with Language Models

Kunal Handa; Yarin Gal; Ellie Pavlick; Noah Goodman; Jacob Andreas; Alex Tamkin; Belinda Z. Li

Bayesian Preference Elicitation with Language Models

Kunal Handa, Yarin Gal, Ellie Pavlick, Noah Goodman, Jacob Andreas, Alex Tamkin, Belinda Z. Li

TL;DR

OPEN proposes a domain-agnostic framework that unifies language models and Bayesian Optimal Experimental Design to actively learn user preferences via natural-language queries. It featurizes domains with LM-derived NL features, initializes a prior over linear preference weights, selects informative pairwise questions using information gain, verbalizes queries with an LM, and updates beliefs with a particle-filter–based posterior. In content recommendation experiments with human participants, OPEN outperforms LM-only and BOED-only baselines in both predictive accuracy and the efficiency of elicitation, while offering improved transparency through explicit feature weights and uncertainty. The work highlights the value of combining structured uncertainty-based querying with flexible NL interfaces, while outlining directions for broader domains, open-ended queries, and reproducibility considerations.

Abstract

Aligning AI systems to users' interests requires understanding and incorporating humans' complex values and preferences. Recently, language models (LMs) have been used to gather information about the preferences of human users. This preference data can be used to fine-tune or guide other LMs and/or AI systems. However, LMs have been shown to struggle with crucial aspects of preference learning: quantifying uncertainty, modeling human mental states, and asking informative questions. These challenges have been addressed in other areas of machine learning, such as Bayesian Optimal Experimental Design (BOED), which focus on designing informative queries within a well-defined feature space. But these methods, in turn, are difficult to scale and apply to real-world problems where simply identifying the relevant features can be difficult. We introduce OPEN (Optimal Preference Elicitation with Natural language) a framework that uses BOED to guide the choice of informative questions and an LM to extract features and translate abstract BOED queries into natural language questions. By combining the flexibility of LMs with the rigor of BOED, OPEN can optimize the informativity of queries while remaining adaptable to real-world domains. In user studies, we find that OPEN outperforms existing LM- and BOED-based methods for preference elicitation.

Bayesian Preference Elicitation with Language Models

TL;DR

Abstract

Paper Structure (50 sections, 10 equations, 10 figures)

This paper contains 50 sections, 10 equations, 10 figures.

Introduction
Preliminaries & Background
Bayesian Optimal Experimental Design
Modeling Human Preferences
The OPEN Framework
Featurization
Initializing User Preferences
Selecting the Optimal Question
Querying the User: Mapping Pairwise Comparisons to NL
Posterior Update
Prediction
Experimental Setup
Hyperparameters
Baselines
LM-only Open-Ended Questions
...and 35 more sections

Figures (10)

Figure 1: Overview of the OPEN framework. In red are the parts where we use a language model. During the elicitation stage, first, a domain $D$ is featurized into feature $\phi$ with a language model, which also gives us a ranking of importance over features (which is used to initialized a prior $p(\theta)$ over user preferences). Based on the prior user preferences, the optimal pairwise comparison query $q$ is computed, which is then verbalized using an LM into natural language. The user response is then taken to update the prior over beliefs. During the prediction stage, a LM is used to featurize a test sample according to the featurization $\phi$ derived from the elicitation stage, then a preference score is computed using the elicited preferences $\theta$.
Figure 2: Time-Integrated Delta Accuracy (TIDA) for each method. We report the integral of the delta accuracy over time. OPEN improves over naive prompting-based approaches, all while improving transparency and reducing computational cost. Error bars are one standard error.
Figure 3: Sample transcripts from OPEN vs. Baselines
Figure 4: Accuracy for OPEN's absolute feature rankings, users' self reported absolute feature rankings, and OPEN's learned weights over features as described in \ref{['sec:analysis']}. Our analysis indicates removing access to precise weights hurts performance. Error bars are one standard error.
Figure 5: Start screen
...and 5 more figures

Bayesian Preference Elicitation with Language Models

TL;DR

Abstract

Bayesian Preference Elicitation with Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)