Synthetic Dialogue Generation for Interactive Conversational Elicitation & Recommendation (ICER)
Moonkyung Ryu, Chih-Wei Hsu, Yinlam Chow, Mohammad Ghavamzadeh, Craig Boutilier
TL;DR
ICER tackles data scarcity in conversational recommender systems by integrating a behavior-grounded user simulator with LM-prompted dialogue refinement to generate realistic, multi-turn conversations. It introduces MD-DICER, a 100K MovieLens-based CRS dataset, and demonstrates that LM prompting yields more natural, consistent, and informative dialogues than templatized templates. Human and automated evaluations show that ICER-generated dialogues enhance user modeling and downstream LM-based recommendations, especially when preference elicitation and critiquing are involved. The work offers a reproducible methodology and dataset to advance behaviorally aware language-agent CRSs in data-scarce settings.
Abstract
While language models (LMs) offer great potential for conversational recommender systems (CRSs), the paucity of public CRS data makes fine-tuning LMs for CRSs challenging. In response, LMs as user simulators qua data generators can be used to train LM-based CRSs, but often lack behavioral consistency, generating utterance sequences inconsistent with those of any real user. To address this, we develop a methodology for generating natural dialogues that are consistent with a user's underlying state using behavior simulators together with LM-prompting. We illustrate our approach by generating a large, open-source CRS data set with both preference elicitation and example critiquing. Rater evaluation on some of these dialogues shows them to exhibit considerable consistency, factuality and naturalness.
