Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

Jerome Ramos; Feng Xia; Xi Wang; Shubham Chatterjee; Xiao Fu; Hossein A. Rahmani; Aldo Lipani

Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

Jerome Ramos, Feng Xia, Xi Wang, Shubham Chatterjee, Xiao Fu, Hossein A. Rahmani, Aldo Lipani

Abstract

Training conversational recommender systems (CRS) requires extensive dialogue data, which is challenging to collect at scale. To address this, researchers have used simulated user-recommender conversations. Traditional simulation approaches often utilize a single large language model (LLM) that generates entire conversations with prior knowledge of the target items, leading to scripted and artificial dialogues. We propose a reference-free simulation framework that trains two independent LLMs, one as the user and one as the conversational recommender. These models interact in real-time without access to predetermined target items, but preference summaries and target attributes, enabling the recommender to genuinely infer user preferences through dialogue. This approach produces more realistic and diverse conversations that closely mirror authentic human-AI interactions. Our reference-free simulators match or exceed existing methods in quality, while offering a scalable solution for generating high-quality conversational recommendation data without constraining conversations to pre-defined target items. We conduct both quantitative and human evaluations to confirm the effectiveness of our reference-free approach.

Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

Abstract

Paper Structure (31 sections, 2 equations, 3 figures, 3 tables)

This paper contains 31 sections, 2 equations, 3 figures, 3 tables.

Introduction
Related Works
Methodology
The Core Insight
Breaking the Single-Target Constraint
Persona Architecture for Authentic User Behavior
Recommender Without Oracle Knowledge
Independent Training for Specialized Behavior
Structured Action Generation
Role-Specific Loss Masking
Emergent Realistic Dynamics
Experimental Setup
Experimental Details
Baselines
Evaluation Metrics
...and 16 more sections

Figures (3)

Figure 1: Unlike prior work that uses templated dialogues and target-aware static LLMs, our method independently trains user and recommender simulators without predetermined items or actions, enabling diverse, realistic interactions.
Figure 2: A single PEARL dialogue is annotated with structural tokens to train two separate simulators: (left) masks assistant turns and keeps user turns to train UserSim; (right) masks user turns and keeps assistant turns to train RecSim.
Figure 3: Human evaluation win ratio comparing reference-free dialogues to PEARL dataset. (* indicates p-value $<$ 0.05)

Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

Abstract

Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

Authors

Abstract

Table of Contents

Figures (3)