Active Preference Inference using Language Models and Probabilistic Reasoning

Wasu Top Piriyakulkij; Volodymyr Kuleshov; Kevin Ellis

Active Preference Inference using Language Models and Probabilistic Reasoning

Wasu Top Piriyakulkij, Volodymyr Kuleshov, Kevin Ellis

TL;DR

This work tackles active preference inference in language-based systems by equipping instruction-tuned LLMs with inference-time probabilistic reasoning to ask informative questions. By defining a probabilistic model grounded in LLM prompts and optimizing information gain through either entropy minimization or model-change maximization, the approach reduces user interaction while improving target-product identification in a WebShop-based setting. The key contributions include a concrete inference-time framework, theoretical equivalence of two information-theoretic objectives, and empirical evidence that the entropy-reduction method outperforms vanilla and ReAct baselines under both binary and soft reward scenarios. The results highlight a practical pathway for making interactive LLM systems more efficient and user-friendly, with potential extensions to non-binary rewards and open-ended queries.

Abstract

Actively inferring user preferences, for example by asking good questions, is important for any human-facing decision-making system. Active inference allows such systems to adapt and personalize themselves to nuanced individual preferences. To enable this ability for instruction-tuned large language models (LLMs), one may prompt them to ask users questions to infer their preferences, transforming the language models into more robust, interactive systems. However, out of the box, these models are not efficient at extracting preferences: the questions they generate are not informative, requiring a high number of user interactions and impeding the usability of the downstream system. In this work, we introduce an inference-time algorithm that helps LLMs quickly infer preferences by using more informative questions. Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM, and returns questions that optimize expected entropy and expected model change. Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines with the same underlying LLM on task performance while using fewer user interactions.

Active Preference Inference using Language Models and Probabilistic Reasoning

TL;DR

Abstract

Paper Structure (18 sections, 3 equations, 3 figures, 1 table)

This paper contains 18 sections, 3 equations, 3 figures, 1 table.

Introduction
Active Preference Inference
Task formulation
Existing approaches
Inference-time probabilistic reasoning for asking informative question
Model definition
Objectives for choosing informative questions
Expected Entropy Minimization.
Expected Model Change Maximization.
Equivalence between the two objectives.
Experiments
Binary Reward
Soft Reward
Related Work
Learning to ask clarifying question.
...and 3 more sections

Figures (3)

Figure 1: (Top) Vanilla instruction-tuned LLM prompted to be a hair growth serum oil seller. (Bottom) LLM with inference-time expected entropy reduction algorithm.
Figure 2: (Left) Average expected binary reward at increasing number of number of questions. (Right) Average expected soft reward at increasing number of number of questions.
Figure 3: Average information gain at each question. Confidence intervals are computed over 150 diffent tasks.

Active Preference Inference using Language Models and Probabilistic Reasoning

TL;DR

Abstract

Active Preference Inference using Language Models and Probabilistic Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)