Identifying Breakdowns in Conversational Recommender Systems using User Simulation
Nolwenn Bernard, Krisztian Balog
TL;DR
This work tackles the robustness gap in conversational recommender systems (CRSs) by proposing a simulator-driven methodology to identify conversational breakdowns. The approach defines breakdown types, detectors, and a four-step workflow that analyzes $N$ CRS–user-simulator conversations to reveal problematic dialogue paths, enabling iterative CRS improvements. The authors demonstrate the method in a case study with IAI MovieBot and a user simulator, showing that targeted modifications can eliminate system failures and reduce other breakdown types, while also acknowledging that the user simulator itself can introduce breakdowns. The methodology is architecture-agnostic and serves as both a diagnostic tool and a development workflow for strengthening CRS robustness and evaluability.
Abstract
We present a methodology to systematically test conversational recommender systems with regards to conversational breakdowns. It involves examining conversations generated between the system and simulated users for a set of pre-defined breakdown types, extracting responsible conversational paths, and characterizing them in terms of the underlying dialogue intents. User simulation offers the advantages of simplicity, cost-effectiveness, and time efficiency for obtaining conversations where potential breakdowns can be identified. The proposed methodology can be used as diagnostic tool as well as a development tool to improve conversational recommendation systems. We apply our methodology in a case study with an existing conversational recommender system and user simulator, demonstrating that with just a few iterations, we can make the system more robust to conversational breakdowns.
