Vague Preference Policy Learning for Conversational Recommendation

Gangyi Zhang; Chongming Gao; Wenqiang Lei; Xiaojie Guo; Shijun Li; Hongshen Chen; Zhuozhi Ding; Sulong Xu; Lingfei Wu

Vague Preference Policy Learning for Conversational Recommendation

Gangyi Zhang, Chongming Gao, Wenqiang Lei, Xiaojie Guo, Shijun Li, Hongshen Chen, Zhuozhi Ding, Sulong Xu, Lingfei Wu

TL;DR

The paper tackles the limitation of assuming clear user preferences in conversational recommender systems by introducing Vague Preference Multi-round Conversational Recommendation (VPMCR). It proposes VPPL, a two-component framework combining Ambiguity-aware Soft Estimation (ASE) for soft, time-decayed preference distributions and Dynamism-aware Policy Learning (DPL) that uses a graph-based conversation model and action pruning to guide reinforcement learning-based decisions. Extensive experiments on four real-world datasets show VPPL outperforms baselines in SR@15, hDCG@(15,10), and efficiency, validating its ability to preserve diversity and adapt to evolving user preferences. The approach advances CRS by explicitly modeling vagueness and relative decision-making, offering practical improvements for real-world conversational search and recommendation systems.

Abstract

Conversational recommendation systems (CRS) commonly assume users have clear preferences, leading to potential over-filtering of relevant alternatives. However, users often exhibit vague, non-binary preferences. We introduce the Vague Preference Multi-round Conversational Recommendation (VPMCR) scenario, employing a soft estimation mechanism to accommodate users' vague and dynamic preferences while mitigating over-filtering. In VPMCR, we propose Vague Preference Policy Learning (VPPL), consisting of Ambiguity-aware Soft Estimation (ASE) and Dynamism-aware Policy Learning (DPL). ASE captures preference vagueness by estimating scores for clicked and non-clicked options, using a choice-based approach and time-aware preference decay. DPL leverages ASE's preference distribution to guide the conversation and adapt to preference changes for recommendations or attribute queries. Extensive experiments demonstrate VPPL's effectiveness within VPMCR, outperforming existing methods and setting a new benchmark. Our work advances CRS by accommodating users' inherent ambiguity and relative decision-making processes, improving real-world applicability.

Vague Preference Policy Learning for Conversational Recommendation

TL;DR

Abstract

Paper Structure (36 sections, 17 equations, 7 figures, 7 tables)

This paper contains 36 sections, 17 equations, 7 figures, 7 tables.

Introduction
Related Work
Conversational recommendation system
RL-based Recommendation
Graph-based Recommendation
Problem Definition
Notations and Symbols
METHODOLOGY
Ambiguity-aware Soft Estimation
Preference Extraction with Choice-based Approach
Time-aware Preference Decay
Dynamism-aware Policy Learning (DPL)
Graph-based Conversation Modeling
Vague Preference Policy Learning
DQN Training
...and 21 more sections

Figures (7)

Figure 1: This provides a simple illustration contrasting user preference modeling under the MIMCR and VPMCR scenarios. In the MIMCR scenario (Figure (a)), non-clicking attributes may lead to the premature removal of potential target items, causing a sudden and possibly erroneous narrowing of the user's preference distribution, as depicted in the left of Figure (b). In the VPMCR scenario (Figure (c)), both clicking and non-clicking attributes contribute to the evolution of a soft preference distribution across the entire item space, able to accommodate vague or dynamic user preferences. Unlike MIMCR, under VPMCR the potential for recommending preference items is retained as shown on the right side of Figure (b).
Figure 2: Vague Preference Policy Learning (VPPL) solution for VPMCR scenario. The model comprises two modules: Ambiguity-aware Soft Estimation (ASE) and Dynamism-aware Policy Learning (DPL). ASE models user’s vague preferences and preference decay during the conversation. DPL uses ASE’s output to construct a dynamic graph for conversation state representation and prunes the action space for efficient policy learning. The goal is to adapt to the user’s vague or dynamic preferences.
Figure 3: SR* of compared methods at different turns on four datasets (RQ1)
Figure 4: Performance (SR@15) of the Random and Ranking strategies for vague preference initialization under varying proportions of vague preferences on the Yelp (left) and Amazon-Book (right) datasets.
Figure 5: Comparative performance analysis of Success Rate with varying decay factor (left) and proportion of vague preference (right) hyperparameters.
...and 2 more figures

Vague Preference Policy Learning for Conversational Recommendation

TL;DR

Abstract

Vague Preference Policy Learning for Conversational Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)