Vague Preference Policy Learning for Conversational Recommendation
Gangyi Zhang, Chongming Gao, Wenqiang Lei, Xiaojie Guo, Shijun Li, Hongshen Chen, Zhuozhi Ding, Sulong Xu, Lingfei Wu
TL;DR
The paper tackles the limitation of assuming clear user preferences in conversational recommender systems by introducing Vague Preference Multi-round Conversational Recommendation (VPMCR). It proposes VPPL, a two-component framework combining Ambiguity-aware Soft Estimation (ASE) for soft, time-decayed preference distributions and Dynamism-aware Policy Learning (DPL) that uses a graph-based conversation model and action pruning to guide reinforcement learning-based decisions. Extensive experiments on four real-world datasets show VPPL outperforms baselines in SR@15, hDCG@(15,10), and efficiency, validating its ability to preserve diversity and adapt to evolving user preferences. The approach advances CRS by explicitly modeling vagueness and relative decision-making, offering practical improvements for real-world conversational search and recommendation systems.
Abstract
Conversational recommendation systems (CRS) commonly assume users have clear preferences, leading to potential over-filtering of relevant alternatives. However, users often exhibit vague, non-binary preferences. We introduce the Vague Preference Multi-round Conversational Recommendation (VPMCR) scenario, employing a soft estimation mechanism to accommodate users' vague and dynamic preferences while mitigating over-filtering. In VPMCR, we propose Vague Preference Policy Learning (VPPL), consisting of Ambiguity-aware Soft Estimation (ASE) and Dynamism-aware Policy Learning (DPL). ASE captures preference vagueness by estimating scores for clicked and non-clicked options, using a choice-based approach and time-aware preference decay. DPL leverages ASE's preference distribution to guide the conversation and adapt to preference changes for recommendations or attribute queries. Extensive experiments demonstrate VPPL's effectiveness within VPMCR, outperforming existing methods and setting a new benchmark. Our work advances CRS by accommodating users' inherent ambiguity and relative decision-making processes, improving real-world applicability.
