Table of Contents
Fetching ...

AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference

Fangzhou Lin, Peiran Li, Shuo Xing, Siyuan Yang, Qianwen Ge, Kazunori Yamada, Ziming Zhang, Haichong Zhang, Zhengzhong Tu

Abstract

Large language models struggle to accumulate evidence across multiple rounds of user interaction, failing to update their beliefs in a manner consistent with Bayesian inference. Existing solutions require fine-tuning on sensitive user interaction data, limiting their applicability in privacy-conscious settings. We propose AdaptFuse, a training-free framework that externalizes probabilistic computation entirely from the LLM: a symbolic module maintains a Bayesian posterior over a discrete hypothesis set, while a frozen LLM contributes semantic reasoning via multi-sample Dirichlet aggregation. The two signals are combined through entropy-adaptive fusion, which automatically weights each source by its predictive confidence, shifting reliance from the LLM to the symbolic posterior as evidence accumulates. We evaluate across three domains: flight recommendation, hotel recommendation, and web shopping; on Gemma 2 9B, Llama 3 8B, and Qwen 2.5 7B. AdaptFuse consistently outperforms both prompting baselines and fine-tuned Bayesian Teaching models on all tasks, with accuracy improving monotonically over interaction rounds. These results demonstrate that principled inference-time algorithms can substitute for fine-tuning in personalized recommendation, without storing or training on sensitive user data. All the code and materials will be open-sourced.

AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference

Abstract

Large language models struggle to accumulate evidence across multiple rounds of user interaction, failing to update their beliefs in a manner consistent with Bayesian inference. Existing solutions require fine-tuning on sensitive user interaction data, limiting their applicability in privacy-conscious settings. We propose AdaptFuse, a training-free framework that externalizes probabilistic computation entirely from the LLM: a symbolic module maintains a Bayesian posterior over a discrete hypothesis set, while a frozen LLM contributes semantic reasoning via multi-sample Dirichlet aggregation. The two signals are combined through entropy-adaptive fusion, which automatically weights each source by its predictive confidence, shifting reliance from the LLM to the symbolic posterior as evidence accumulates. We evaluate across three domains: flight recommendation, hotel recommendation, and web shopping; on Gemma 2 9B, Llama 3 8B, and Qwen 2.5 7B. AdaptFuse consistently outperforms both prompting baselines and fine-tuned Bayesian Teaching models on all tasks, with accuracy improving monotonically over interaction rounds. These results demonstrate that principled inference-time algorithms can substitute for fine-tuning in personalized recommendation, without storing or training on sensitive user data. All the code and materials will be open-sourced.

Paper Structure

This paper contains 43 sections, 2 theorems, 18 equations, 4 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

Let $\theta \sim \mathrm{Dir}(\alpha_0 \mathbf{1}_K)$ be a prior over the categorical parameter. Treating the confidence-weighted vectors $\{w_{s,i}\}$ as fractional pseudo-count observations: a standard extension of Dirichlet-Multinomial conjugacy elkan2006clusteringminka2000estimating; the posteri

Figures (4)

  • Figure 1: Accuracy over interaction rounds on the flight task. We show accuracy from the first through the final (fifth) round across different methods, including original LLMs, models fine-tuned with Oracle Learning and Bayesian Teaching qiu2026bayesian, and our training-free AdaptFuse.
  • Figure 2: Varying task complexity. Final-round accuracy across methods as the number of item attributes $d$ varies from 2 to 8.
  • Figure 3: Generalization to new domains.(a) Final-round accuracy on the hotel recommendation task. (b) Final-round accuracy on the web shopping task. Error bars denote standard error over three random seeds.
  • Figure 4: Interactive preference inference on flight recommendation task. We show accuracy after the first and final (fifth) rounds across different assistants, including original LLMs, models fine-tuned with the Bayesian Assistant, and models fine-tuned with an oracle that provides correct answers (models provided by the qiu2026bayesian). Both fine-tuning approaches improve performance, and our method achieves the best overall results. Error bars denote the standard error over three random seeds.

Theorems & Definitions (6)

  • Lemma 1: Dirichlet Aggregation as Posterior Mean
  • proof
  • proof
  • Proposition 1: Fusion Bound
  • proof
  • Remark 1: Connection to posterior concentration