Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation

David Eric Austin; Anton Korikov; Armin Toroghi; Scott Sanner

Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation

David Eric Austin, Anton Korikov, Armin Toroghi, Scott Sanner

TL;DR

The paper tackles cold-start conversational recommendation by reframing natural-language preference elicitation as a Bayesian optimization problem over NL item descriptions. It introduces PEBOL, a NL-PE algorithm that maintains Beta-belief utilities for items and uses LLM-based acquisition functions guided by decision-theoretic policies (TS, UCB, ER) to generate targeted short NL queries. Preferences inferred via NLI between user utterances and item descriptions drive posterior updates, enabling efficient exploration-exploitation trade-offs. Experimental results on real datasets show that PEBOL substantially improves MRR@10 over monolithic LLM baselines under simulated user conditions, validating the benefit of combining Bayesian reasoning with NL-PE in ConvRec systems.

Abstract

Designing preference elicitation (PE) methodologies that can quickly ascertain a user's top item preferences in a cold-start setting is a key challenge for building effective and personalized conversational recommendation (ConvRec) systems. While large language models (LLMs) enable fully natural language (NL) PE dialogues, we hypothesize that monolithic LLM NL-PE approaches lack the multi-turn, decision-theoretic reasoning required to effectively balance the exploration and exploitation of user preferences towards an arbitrary item set. In contrast, traditional Bayesian optimization PE methods define theoretically optimal PE strategies, but cannot generate arbitrary NL queries or reason over content in NL item descriptions -- requiring users to express preferences via ratings or comparisons of unfamiliar items. To overcome the limitations of both approaches, we formulate NL-PE in a Bayesian Optimization (BO) framework that seeks to actively elicit NL feedback to identify the best recommendation. Key challenges in generalizing BO to deal with natural language feedback include determining: (a) how to leverage LLMs to model the likelihood of NL preference feedback as a function of item utilities, and (b) how to design an acquisition function for NL BO that can elicit preferences in the infinite space of language. We demonstrate our framework in a novel NL-PE algorithm, PEBOL, which uses: 1) Natural Language Inference (NLI) between user preference utterances and NL item descriptions to maintain Bayesian preference beliefs, and 2) BO strategies such as Thompson Sampling (TS) and Upper Confidence Bound (UCB) to steer LLM query generation. We numerically evaluate our methods in controlled simulations, finding that after 10 turns of dialogue, PEBOL can achieve an MRR@10 of up to 0.27 compared to the best monolithic LLM baseline's MRR@10 of 0.17, despite relying on earlier and smaller LLMs.

Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation

TL;DR

Abstract

Paper Structure (47 sections, 22 equations, 14 figures, 13 tables)

This paper contains 47 sections, 22 equations, 14 figures, 13 tables.

Introduction
Background and Related Work
Bayesian Optimization
Preference Elicitation
Language-Based Preference Elicitation
Conversational Recommendation
Natural Language Inference
Problem Definition
Methodology
Limitations of Monolithic LLM Prompting
PEBOL Overview
Utility Beliefs
Prior Beliefs
Observation Model
Binary Item Response Likelihoods and Posterior Update
...and 32 more sections

Figures (14)

Figure 1: PEBOL's belief updates over a cold-start user's item utilities during three turns of NL dialogue. Bayesian preference beliefs not only facilitate recommendation, but also enable Bayesian optimization policies to guide LLM query generation, avoiding over-exploration (asking about clearly low-value items) and over-exploitation (over-focusing on known preferences).
Figure 2: The PEBOL NL-PE algorithm, which maintains a Bayesian belief state over a user's item preferences given an arbitrary set of NL item descriptions $\mathbf{x}$. This belief is used by a decision-theoretic policy to balance the exploration and exploitation of preferences by strategically selecting an item description $x_{i^t}$ as the basis for LLM query generation. Belief updates are computed through Bayesian inference with NLI entailment scores between item descriptions and query-response pairs.
Figure 3: Cherry-picked system-generated dialogues from our NL-PE experiments. The Monolithic GPT-3.5 dialogue (left) demonstrates over-exploitation, with $q^3$ directly extending $q^2$ after a positive user preference is observed and leading to the extreme case of query repetition ($q^4$ = $q^3$). In contrast, PEBOL (right) continues exploring even after a positive response, while focusing on promising aspects (three out of four queries elicit a positive response) by using UCB-guided query generation.
Figure 4: MRR@10 for MonoLLM and PEBOL-P with uncertainty-informed policies (UCB, TS, ER). All methods show preference learning over time and MonoLLM is generally outperformed by PEBOL.
Figure 5: MRR@10 for PEBOL-P with various context acquisition policies.
...and 9 more figures

Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation

TL;DR

Abstract

Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation

Authors

TL;DR

Abstract

Table of Contents

Figures (14)