Table of Contents
Fetching ...

Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

Jerome Ramos, Feng Xia, Xi Wang, Shubham Chatterjee, Xiao Fu, Hossein A. Rahmani, Aldo Lipani

Abstract

Training conversational recommender systems (CRS) requires extensive dialogue data, which is challenging to collect at scale. To address this, researchers have used simulated user-recommender conversations. Traditional simulation approaches often utilize a single large language model (LLM) that generates entire conversations with prior knowledge of the target items, leading to scripted and artificial dialogues. We propose a reference-free simulation framework that trains two independent LLMs, one as the user and one as the conversational recommender. These models interact in real-time without access to predetermined target items, but preference summaries and target attributes, enabling the recommender to genuinely infer user preferences through dialogue. This approach produces more realistic and diverse conversations that closely mirror authentic human-AI interactions. Our reference-free simulators match or exceed existing methods in quality, while offering a scalable solution for generating high-quality conversational recommendation data without constraining conversations to pre-defined target items. We conduct both quantitative and human evaluations to confirm the effectiveness of our reference-free approach.

Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

Abstract

Training conversational recommender systems (CRS) requires extensive dialogue data, which is challenging to collect at scale. To address this, researchers have used simulated user-recommender conversations. Traditional simulation approaches often utilize a single large language model (LLM) that generates entire conversations with prior knowledge of the target items, leading to scripted and artificial dialogues. We propose a reference-free simulation framework that trains two independent LLMs, one as the user and one as the conversational recommender. These models interact in real-time without access to predetermined target items, but preference summaries and target attributes, enabling the recommender to genuinely infer user preferences through dialogue. This approach produces more realistic and diverse conversations that closely mirror authentic human-AI interactions. Our reference-free simulators match or exceed existing methods in quality, while offering a scalable solution for generating high-quality conversational recommendation data without constraining conversations to pre-defined target items. We conduct both quantitative and human evaluations to confirm the effectiveness of our reference-free approach.
Paper Structure (31 sections, 2 equations, 3 figures, 3 tables)

This paper contains 31 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Unlike prior work that uses templated dialogues and target-aware static LLMs, our method independently trains user and recommender simulators without predetermined items or actions, enabling diverse, realistic interactions.
  • Figure 2: A single PEARL dialogue is annotated with structural tokens to train two separate simulators: (left) masks assistant turns and keeps user turns to train UserSim; (right) masks user turns and keeps assistant turns to train RecSim.
  • Figure 3: Human evaluation win ratio comparing reference-free dialogues to PEARL dataset. (* indicates p-value $<$ 0.05)