Table of Contents
Fetching ...

Active Preference Inference using Language Models and Probabilistic Reasoning

Wasu Top Piriyakulkij, Volodymyr Kuleshov, Kevin Ellis

TL;DR

This work tackles active preference inference in language-based systems by equipping instruction-tuned LLMs with inference-time probabilistic reasoning to ask informative questions. By defining a probabilistic model grounded in LLM prompts and optimizing information gain through either entropy minimization or model-change maximization, the approach reduces user interaction while improving target-product identification in a WebShop-based setting. The key contributions include a concrete inference-time framework, theoretical equivalence of two information-theoretic objectives, and empirical evidence that the entropy-reduction method outperforms vanilla and ReAct baselines under both binary and soft reward scenarios. The results highlight a practical pathway for making interactive LLM systems more efficient and user-friendly, with potential extensions to non-binary rewards and open-ended queries.

Abstract

Actively inferring user preferences, for example by asking good questions, is important for any human-facing decision-making system. Active inference allows such systems to adapt and personalize themselves to nuanced individual preferences. To enable this ability for instruction-tuned large language models (LLMs), one may prompt them to ask users questions to infer their preferences, transforming the language models into more robust, interactive systems. However, out of the box, these models are not efficient at extracting preferences: the questions they generate are not informative, requiring a high number of user interactions and impeding the usability of the downstream system. In this work, we introduce an inference-time algorithm that helps LLMs quickly infer preferences by using more informative questions. Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM, and returns questions that optimize expected entropy and expected model change. Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines with the same underlying LLM on task performance while using fewer user interactions.

Active Preference Inference using Language Models and Probabilistic Reasoning

TL;DR

This work tackles active preference inference in language-based systems by equipping instruction-tuned LLMs with inference-time probabilistic reasoning to ask informative questions. By defining a probabilistic model grounded in LLM prompts and optimizing information gain through either entropy minimization or model-change maximization, the approach reduces user interaction while improving target-product identification in a WebShop-based setting. The key contributions include a concrete inference-time framework, theoretical equivalence of two information-theoretic objectives, and empirical evidence that the entropy-reduction method outperforms vanilla and ReAct baselines under both binary and soft reward scenarios. The results highlight a practical pathway for making interactive LLM systems more efficient and user-friendly, with potential extensions to non-binary rewards and open-ended queries.

Abstract

Actively inferring user preferences, for example by asking good questions, is important for any human-facing decision-making system. Active inference allows such systems to adapt and personalize themselves to nuanced individual preferences. To enable this ability for instruction-tuned large language models (LLMs), one may prompt them to ask users questions to infer their preferences, transforming the language models into more robust, interactive systems. However, out of the box, these models are not efficient at extracting preferences: the questions they generate are not informative, requiring a high number of user interactions and impeding the usability of the downstream system. In this work, we introduce an inference-time algorithm that helps LLMs quickly infer preferences by using more informative questions. Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM, and returns questions that optimize expected entropy and expected model change. Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines with the same underlying LLM on task performance while using fewer user interactions.
Paper Structure (18 sections, 3 equations, 3 figures, 1 table)

This paper contains 18 sections, 3 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: (Top) Vanilla instruction-tuned LLM prompted to be a hair growth serum oil seller. (Bottom) LLM with inference-time expected entropy reduction algorithm.
  • Figure 2: (Left) Average expected binary reward at increasing number of number of questions. (Right) Average expected soft reward at increasing number of number of questions.
  • Figure 3: Average information gain at each question. Confidence intervals are computed over 150 diffent tasks.