Table of Contents
Fetching ...

SARDINE: A Simulator for Automated Recommendation in Dynamic and Interactive Environments

Romain Deffayet, Thibaut Thonet, Dongyoon Hwang, Vassilissa Lehoux, Jean-Michel Renders, Maarten de Rijke

TL;DR

SARDINE tackles the need for a controllable, interpretable simulator to study interactive recommender systems under dynamics such as multi-step consequences, data biases, uncertainty, and slate presentation. It introduces a configurable MDP-based simulator with embedding-driven users/items, a relevance-click model, boredom and influence dynamics, and full or partial observability, producing nine environments. Across extensive experiments with baselines like Random, Greedy Oracle, REINFORCE, SAC, GeMS, and HAC, the study reveals that SAC+Top-K often yields strong performance, though its success depends on high-quality item embeddings and encoder choices under partial observability. The work demonstrates the simulator’s utility for probing long-horizon effects, biases, and slate-related challenges, providing a foundation for robust, data-driven recommender research and guiding future extensions such as non-stationary and deployment-efficient learning.

Abstract

Simulators can provide valuable insights for researchers and practitioners who wish to improve recommender systems, because they allow one to easily tweak the experimental setup in which recommender systems operate, and as a result lower the cost of identifying general trends and uncovering novel findings about the candidate methods. A key requirement to enable this accelerated improvement cycle is that the simulator is able to span the various sources of complexity that can be found in the real recommendation environment that it simulates. With the emergence of interactive and data-driven methods - e.g., reinforcement learning or online and counterfactual learning-to-rank - that aim to achieve user-related goals beyond the traditional accuracy-centric objectives, adequate simulators are needed. In particular, such simulators must model the various mechanisms that render the recommendation environment dynamic and interactive, e.g., the effect of recommendations on the user or the effect of biased data on subsequent iterations of the recommender system. We therefore propose SARDINE, a flexible and interpretable recommendation simulator that can help accelerate research in interactive and data-driven recommender systems. We demonstrate its usefulness by studying existing methods within nine diverse environments derived from SARDINE, and even uncover novel insights about them.

SARDINE: A Simulator for Automated Recommendation in Dynamic and Interactive Environments

TL;DR

SARDINE tackles the need for a controllable, interpretable simulator to study interactive recommender systems under dynamics such as multi-step consequences, data biases, uncertainty, and slate presentation. It introduces a configurable MDP-based simulator with embedding-driven users/items, a relevance-click model, boredom and influence dynamics, and full or partial observability, producing nine environments. Across extensive experiments with baselines like Random, Greedy Oracle, REINFORCE, SAC, GeMS, and HAC, the study reveals that SAC+Top-K often yields strong performance, though its success depends on high-quality item embeddings and encoder choices under partial observability. The work demonstrates the simulator’s utility for probing long-horizon effects, biases, and slate-related challenges, providing a foundation for robust, data-driven recommender research and guiding future extensions such as non-stationary and deployment-efficient learning.

Abstract

Simulators can provide valuable insights for researchers and practitioners who wish to improve recommender systems, because they allow one to easily tweak the experimental setup in which recommender systems operate, and as a result lower the cost of identifying general trends and uncovering novel findings about the candidate methods. A key requirement to enable this accelerated improvement cycle is that the simulator is able to span the various sources of complexity that can be found in the real recommendation environment that it simulates. With the emergence of interactive and data-driven methods - e.g., reinforcement learning or online and counterfactual learning-to-rank - that aim to achieve user-related goals beyond the traditional accuracy-centric objectives, adequate simulators are needed. In particular, such simulators must model the various mechanisms that render the recommendation environment dynamic and interactive, e.g., the effect of recommendations on the user or the effect of biased data on subsequent iterations of the recommender system. We therefore propose SARDINE, a flexible and interpretable recommendation simulator that can help accelerate research in interactive and data-driven recommender systems. We demonstrate its usefulness by studying existing methods within nine diverse environments derived from SARDINE, and even uncover novel insights about them.
Paper Structure (48 sections, 3 equations, 10 figures, 6 tables)

This paper contains 48 sections, 3 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Diagram summarizing the different components of the proposed SARDINE simulator, and its interaction with the recommendation agent.
  • Figure 2: Results on the SingleItem-Static (\ref{['fig:SingleItem-Static-return']}), SingleItem-PartialObs (\ref{['fig:SingleItem-PartialObs-return']}), and SingleItem-BoredInf (\ref{['fig:SingleItem-BoredInf-return']}, \ref{['fig:SingleItem-BoredInf-boredom']}) environments. The colored envelope surrounding lines indicates the 95% confidence interval around the mean computed from 5 seeded runs. Boredom results are not shown for SingleItem-Static and SingleItem-PartialObs as these static environments do not include a boredom component and thus all methods have a default boredom of 0.
  • Figure 3: Results on the SlateTopK-Bored environment with default, ideal item embeddings (\ref{['fig:SlateTopK-Bored-return']}, \ref{['fig:SlateTopK-Bored-boredom']}) and with matrix factorization item embeddings (\ref{['fig:SlateTopK-Bored-MF-return']}, \ref{['fig:SlateTopK-Bored-MF-boredom']}). The colored envelope surrounding lines indicates the 95% confidence interval around the mean computed from 5 seeded runs. Some approaches keep the same performance across the two settings as they either do not rely on item embeddings (Random, REINFORCE Top-K) or are an oracle baseline and only make sense with ideal item embeddings (Greedy Oracle).
  • Figure 4: Results on the SlateTopK-BoredInf environment. The colored envelope surrounding lines indicates the 95% confidence interval around the mean computed from 5 seeded runs.
  • Figure 5: Results in terms of return ($\uparrow$) on the SlateTopK-PartialObs (\ref{['fig:SlateTopK-PartialObs-return']}) and SlateTopK-Uncertain (\ref{['fig:SlateTopK-Uncertain10-return']}, \ref{['fig:SlateTopK-Uncertain5-return']}, \ref{['fig:SlateTopK-Uncertain2-return']}) environments. The click uncertainty degree varies from low (\ref{['fig:SlateTopK-PartialObs-return']}), medium (\ref{['fig:SlateTopK-Uncertain10-return']}), high (\ref{['fig:SlateTopK-Uncertain5-return']}) to very high (\ref{['fig:SlateTopK-Uncertain2-return']}), corresponding to a scale hyperparameter $\lambda$ in the relevance function equal to 100, 10, 5, and 2, respectively (see Section \ref{['sec:click']} for more details). The colored envelope surrounding lines indicates the 95% confidence interval around the mean computed from 5 seeded runs.
  • ...and 5 more figures