Eliciting Human Preferences with Language Models

Belinda Z. Li; Alex Tamkin; Noah Goodman; Jacob Andreas

Eliciting Human Preferences with Language Models

Belinda Z. Li, Alex Tamkin, Noah Goodman, Jacob Andreas

TL;DR

This paper introduces Generative Active Task Elicitation (GATE), a framework that uses language models to interactively elicit and infer human task preferences, addressing the shortcomings of prompting and traditional label-based paradigms. By treating elicitation as a dialogue-driven process, GATE generates informative edge cases and questions (yes/no and open-ended) to rapidly align model behavior with user values. Across email verification, content recommendation, and moral reasoning, GATE demonstrates improved alignment and comparable or reduced user effort relative to baselines, highlighting the method's strength in handling edge cases and nebulous preferences. The work also discusses limitations, potential risks, and directions for scaling and applying LM-driven elicitation to complex, high-stakes domains.

Abstract

Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts. But selecting examples or writing prompts for can be challenging--especially in tasks that involve unusual edge cases, demand precise articulation of nebulous preferences, or require an accurate mental model of LM behavior. We propose to use *LMs themselves* to guide the task specification process. In this paper, we introduce **Generative Active Task Elicitation (GATE)**: a learning framework in which models elicit and infer intended behavior through free-form, language-based interaction with users. We study GATE in three domains: email validation, content recommendation, and moral reasoning. In preregistered experiments, we show that LMs prompted to perform GATE (e.g., by generating open-ended questions or synthesizing informative edge cases) elicit responses that are often more informative than user-written prompts or labels. Users report that interactive task elicitation requires less effort than prompting or example labeling and surfaces novel considerations not initially anticipated by users. Our findings suggest that LM-driven elicitation can be a powerful tool for aligning models to complex human preferences and values.

Eliciting Human Preferences with Language Models

TL;DR

Abstract

Paper Structure (69 sections, 2 equations, 15 figures, 1 table)

This paper contains 69 sections, 2 equations, 15 figures, 1 table.

Introduction
Learning as Task Elicitation
The Task Elicitation Framework
Existing Learning Paradigms in the Task Elicitation Framework
Supervised learning: passive, example-based
Active learning: interactive, example-based
Prompting: passive, free-form
Generative Active Task Elicitation
Methods for gate
Generative active learning
Generating yes-or-no questions
Generating open-ended questions
Experiment Setup
Domains and datasets
Content Recommendation
...and 54 more sections

Figures (15)

Figure 1: Generative Active Task Elicitation (gate) elicits user preferences through interactive, free-form questions, which can then be used in downstream decision-making. Unlike non-interactive elicitation approaches (e.g., prompting), which rely entirely on the human to elucidate their preferences, generative elicitation is better able to probe nuances of human preferences. Unlike active learning approaches, generative elicitation can ask more generic, free-form questions. The three parts of this figure illustrate: (A) Fuzzy user preferences: A user wishes to translate their fuzzy preferences for how a task should be performed into a specification for a machine learning model. This is challenging because users lack perfect introspection, preferences can be difficult to specify in language, the specification needs to anticipate tricky real-world edge cases, and models may misgeneralize from provided examples or instructions. (B) Task elicitation: We consider various ways of eliciting these fuzzy preferences from users, including non-interactive prompting, active learning, and generative elicitation (gate). (C) Evaluation: We evaluate methods on a held-out test set, scoring how well a language model predicted the true decisions made by the user.
Figure 2: Axes of variation in task elicitation.
Figure 3: Across three domains, our LM-prompting implementations of GATE are generally able to elicit human preferences beyond baseline supervised learning, active learning, or human-written prompts. We measure the Area Under the "$\Delta p(correct)$ vs. Interaction time" Curve, which gives us a time-normalized metric for how well (and how quickly) each elicitation method is at aligning with human preferences. While GATE methods generally outperform the baseline methods as well as no interaction (represented by a $\Delta p(correct)$ of 0), we are only able to establish statistical significance between GATE and baselines in the content recommendation and email verification domains.
Figure 4: Left: GATE methods are equally or less mentally demanding than other methods. We plot the perceived mental demand across methods and domains (higher $=$ greater mental demand). Right: Language model elicitation does not shift human preferences. We plot the proportion of participants who answered "yes" to each test question, comparing no LM interaction (user-written prompts) to LM interaction (gate) elicitation. The red line is the $y=x$ curve, which serves as a guideline to see how well humans' no-LM interaction preferences align with their preferences post-LM interaction (if they align perfectly, the points should fall along this curve). We see that the points generally hover around this curve.
Figure 5: Excerpts of real transcripts across the different domains and elicitation methods we investigate. The System messages are generated by the language model, while the User messages are produced by human participants. Overall, the model is able to generate diverse and contextually-appropriate questions in each setting. See \ref{['sec:domains', 'sec:elicitation-methods']} for more details on the domains and methods respectively.
...and 10 more figures

Eliciting Human Preferences with Language Models

TL;DR

Abstract

Eliciting Human Preferences with Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (15)