SimLab: A Platform for Simulation-based Evaluation of Conversational Information Access Systems
Nolwenn Bernard, Sharath Chandra Etagi Suresh, Krisztian Balog, ChengXiang Zhai
TL;DR
SimLab addresses the persistent challenge of evaluating conversational information access systems with reproducible experiments by introducing a cloud-based platform for simulation-based evaluation of CIA agents and user simulators. The platform combines an online Evaluation Framework, containerized systems, a Systems Registry, VM management, and monitoring within a cloud-native architecture, demonstrated through an initial conversational movie recommendation use case with a public leaderboard. Key contributions include specifying platform requirements, detailing implementation choices (Python-based framework, Flask/React web app, MongoDB, Docker registry, Jenkins, Grafana/Prometheus), and analyzing costs on GCP. The work envisions accelerating research, enhancing reproducibility, and supporting education, while inviting community participation to build a self-sustaining ecosystem for CIA and user simulation research.
Abstract
Progress in conversational information access (CIA) systems has been hindered by the difficulty of evaluating such systems with reproducible experiments. While user simulation offers a promising solution, the lack of infrastructure and tooling to support this evaluation paradigm remains a significant barrier. To address this gap, we introduce SimLab, the first cloud-based platform providing a centralized solution for the community to benchmark both conversational systems and user simulators in a controlled and reproducible setting. We articulate the requirements for such a platform and propose a general infrastructure to meet them. We then present the design and implementation of an initial version of SimLab and showcase its features through an initial simulation-based evaluation task in conversational movie recommendation. Furthermore, we discuss the platform's sustainability and future opportunities for development, inviting the community to drive further progress in the fields of CIA and user simulation.
