SARDINE: A Simulator for Automated Recommendation in Dynamic and Interactive Environments
Romain Deffayet, Thibaut Thonet, Dongyoon Hwang, Vassilissa Lehoux, Jean-Michel Renders, Maarten de Rijke
TL;DR
SARDINE tackles the need for a controllable, interpretable simulator to study interactive recommender systems under dynamics such as multi-step consequences, data biases, uncertainty, and slate presentation. It introduces a configurable MDP-based simulator with embedding-driven users/items, a relevance-click model, boredom and influence dynamics, and full or partial observability, producing nine environments. Across extensive experiments with baselines like Random, Greedy Oracle, REINFORCE, SAC, GeMS, and HAC, the study reveals that SAC+Top-K often yields strong performance, though its success depends on high-quality item embeddings and encoder choices under partial observability. The work demonstrates the simulator’s utility for probing long-horizon effects, biases, and slate-related challenges, providing a foundation for robust, data-driven recommender research and guiding future extensions such as non-stationary and deployment-efficient learning.
Abstract
Simulators can provide valuable insights for researchers and practitioners who wish to improve recommender systems, because they allow one to easily tweak the experimental setup in which recommender systems operate, and as a result lower the cost of identifying general trends and uncovering novel findings about the candidate methods. A key requirement to enable this accelerated improvement cycle is that the simulator is able to span the various sources of complexity that can be found in the real recommendation environment that it simulates. With the emergence of interactive and data-driven methods - e.g., reinforcement learning or online and counterfactual learning-to-rank - that aim to achieve user-related goals beyond the traditional accuracy-centric objectives, adequate simulators are needed. In particular, such simulators must model the various mechanisms that render the recommendation environment dynamic and interactive, e.g., the effect of recommendations on the user or the effect of biased data on subsequent iterations of the recommender system. We therefore propose SARDINE, a flexible and interpretable recommendation simulator that can help accelerate research in interactive and data-driven recommender systems. We demonstrate its usefulness by studying existing methods within nine diverse environments derived from SARDINE, and even uncover novel insights about them.
