UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

Nolwenn Bernard; Krisztian Balog

UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

Nolwenn Bernard, Krisztian Balog

TL;DR

This paper tackles the challenge of evaluating conversational recommender systems with scalable, reproducible methods. It introduces UserSimCRS v2, a significantly upgraded framework that combines an enhanced agenda-based user simulator with two LLM-based simulators, unified data formats, broader CRS integration, and an LLM-based evaluation utility. The approach supports multiple benchmark datasets (e.g., ReDial, INSPIRED, IARD) and CRSs via a CRS Arena interface, demonstrated through a movie recommendation case study that reveals substantive variability across simulators and datasets. These contributions lower barriers to simulation-based evaluation and enable more robust, multifaceted assessment of CRSs and user models, paving the way for richer benchmarking and research directions in user simulation.

Abstract

Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.

UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

TL;DR

Abstract

UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)