Active Preference Inference using Language Models and Probabilistic Reasoning
Wasu Top Piriyakulkij, Volodymyr Kuleshov, Kevin Ellis
TL;DR
This work tackles active preference inference in language-based systems by equipping instruction-tuned LLMs with inference-time probabilistic reasoning to ask informative questions. By defining a probabilistic model grounded in LLM prompts and optimizing information gain through either entropy minimization or model-change maximization, the approach reduces user interaction while improving target-product identification in a WebShop-based setting. The key contributions include a concrete inference-time framework, theoretical equivalence of two information-theoretic objectives, and empirical evidence that the entropy-reduction method outperforms vanilla and ReAct baselines under both binary and soft reward scenarios. The results highlight a practical pathway for making interactive LLM systems more efficient and user-friendly, with potential extensions to non-binary rewards and open-ended queries.
Abstract
Actively inferring user preferences, for example by asking good questions, is important for any human-facing decision-making system. Active inference allows such systems to adapt and personalize themselves to nuanced individual preferences. To enable this ability for instruction-tuned large language models (LLMs), one may prompt them to ask users questions to infer their preferences, transforming the language models into more robust, interactive systems. However, out of the box, these models are not efficient at extracting preferences: the questions they generate are not informative, requiring a high number of user interactions and impeding the usability of the downstream system. In this work, we introduce an inference-time algorithm that helps LLMs quickly infer preferences by using more informative questions. Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM, and returns questions that optimize expected entropy and expected model change. Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines with the same underlying LLM on task performance while using fewer user interactions.
