Leveraging the Power of Conversations: Optimal Key Term Selection in Conversational Contextual Bandits

Maoli Liu; Zhuohua Li; Xiangxiang Dai; John C. S. Lui

Leveraging the Power of Conversations: Optimal Key Term Selection in Conversational Contextual Bandits

Maoli Liu, Zhuohua Li, Xiangxiang Dai, John C. S. Lui

TL;DR

This work tackles key-term selection in conversational contextual bandits by proposing three algorithms—CLiSK, CLiME, and CLiSK-ME—that respectively employ smoothed key-term contexts and adaptive conversation mechanisms. The authors prove improved regret bounds of O(√{dT log T}) for CLiME and CLiSK-ME (and O(√{dT log T} + d) for CLiSK), along with a matching lower bound Ω(√{dT}), establishing near-minimax optimality. Empirical results on synthetic and real-world datasets show at least a 14.6% reduction in cumulative regret and demonstrate efficient interaction, faster preference estimation, and reasonable running times. The work thus offers a principled, theoretically grounded approach to enhance exploration and adaptivity in conversational recommendations, with practical impact for real-world CRSs.

Abstract

Conversational recommender systems proactively query users with relevant "key terms" and leverage the feedback to elicit users' preferences for personalized recommendations. Conversational contextual bandits, a prevalent approach in this domain, aim to optimize preference learning by balancing exploitation and exploration. However, several limitations hinder their effectiveness in real-world scenarios. First, existing algorithms employ key term selection strategies with insufficient exploration, often failing to thoroughly probe users' preferences and resulting in suboptimal preference estimation. Second, current algorithms typically rely on deterministic rules to initiate conversations, causing unnecessary interactions when preferences are well-understood and missed opportunities when preferences are uncertain. To address these limitations, we propose three novel algorithms: CLiSK, CLiME, and CLiSK-ME. CLiSK introduces smoothed key term contexts to enhance exploration in preference learning, CLiME adaptively initiates conversations based on preference uncertainty, and CLiSK-ME integrates both techniques. We theoretically prove that all three algorithms achieve a tighter regret upper bound of $O(\sqrt{dT\log{T}})$ with respect to the time horizon $T$, improving upon existing methods. Additionally, we provide a matching lower bound $Ω(\sqrt{dT})$ for conversational bandits, demonstrating that our algorithms are nearly minimax optimal. Extensive evaluations on both synthetic and real-world datasets show that our approaches achieve at least a 14.6% improvement in cumulative regret.

Leveraging the Power of Conversations: Optimal Key Term Selection in Conversational Contextual Bandits

TL;DR

Abstract

Leveraging the Power of Conversations: Optimal Key Term Selection in Conversational Contextual Bandits

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (30)