Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation

Wei Fan; Weijia Zhang; Weiqi Wang; Yangqiu Song; Hao Liu

Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation

Wei Fan, Weijia Zhang, Weiqi Wang, Yangqiu Song, Hao Liu

TL;DR

This work introduces Multi-Type-Attribute Multi-round Conversational Recommendation (MTAMCR), a realistic setting where a conversational recommender system can query multiple attribute types within each round. It proposes Chain-of-Choice Hierarchical Policy Learning (CoCHPL), a hierarchical RL framework that uses a long policy over options (ask or recommend) and short intra-option policies to generate chains of attribute or item choices, aided by a dynamic-graph state representation and a feedback-prediction module. The model is trained with an option-based MDP, a dueling Q-network, and termination-gradients, enabling efficient and effective chain reasoning. Across four benchmark datasets, CoCHPL achieves superior performance in success rate, interaction efficiency, and ranking quality, while demonstrating better attribute diversity and dependency modeling within turns, highlighting its practical impact for scalable and user-friendly CRS.

Abstract

Conversational Recommender Systems (CRS) illuminate user preferences via multi-round interactive dialogues, ultimately navigating towards precise and satisfactory recommendations. However, contemporary CRS are limited to inquiring binary or multi-choice questions based on a single attribute type (e.g., color) per round, which causes excessive rounds of interaction and diminishes the user's experience. To address this, we propose a more realistic and efficient conversational recommendation problem setting, called Multi-Type-Attribute Multi-round Conversational Recommendation (MTAMCR), which enables CRS to inquire about multi-choice questions covering multiple types of attributes in each round, thereby improving interactive efficiency. Moreover, by formulating MTAMCR as a hierarchical reinforcement learning task, we propose a Chain-of-Choice Hierarchical Policy Learning (CoCHPL) framework to enhance both the questioning efficiency and recommendation effectiveness in MTAMCR. Specifically, a long-term policy over options (i.e., ask or recommend) determines the action type, while two short-term intra-option policies sequentially generate the chain of attributes or items through multi-step reasoning and selection, optimizing the diversity and interdependence of questioning attributes. Finally, extensive experiments on four benchmarks demonstrate the superior performance of CoCHPL over prevailing state-of-the-art methods.

Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation

TL;DR

Abstract

Paper Structure (30 sections, 16 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 16 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Multi-round Conversational Recommendation
The Options Framework
Preliminary
Multi-Type-Attribute Multi-round Conversational Recommendation.
Chain-of-Choice Hierarchical Policy Learning
Option-based MDP Environment
State-Option Space
Action Space.
Transition.
Reward.
Dynamic-Graph State Representation Learning
Graph Construction.
State Representation.
...and 15 more sections

Figures (6)

Figure 1: Example of different conversational recommendation settings.
Figure 2: During each turn, the agent engages in a decision-making process where it selects a choice and then predicts the user's feedback in order to infer the subsequent state. With the predicted state, the agent selects the next choice, continually repeating this process until the round eventually reaches its termination point.
Figure 3: The overview of Chain-of-Choice Hierarchical Policy Learning.
Figure 4: Comparisons at different conversation turns on four datasets.
Figure 5: Performance comparisons of different asked attribute instances numbers in Yelp and Amazon-Book datasets.
...and 1 more figures

Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation

TL;DR

Abstract

Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)