Table of Contents
Fetching ...

RecoWorld: Building Simulated Environments for Agentic Recommender Systems

Fei Liu, Xinyu Lin, Hanchao Yu, Mingyuan Wu, Jianyu Wang, Qiang Zhang, Zhuokai Zhao, Yinglong Xia, Yao Zhang, Weiwei Li, Mingze Gao, Qifan Wang, Lizhu Zhang, Benyu Zhang, Xiangjun Fan

TL;DR

RecoWorld provides a Gym-like simulated environment for training and evaluating instruction-following, agentic recommender systems by coupling an LLM-powered user simulator with an autonomous recommender in multi-turn sessions. It introduces dynamic memory and mindset-update mechanisms to model lifelong user behavior, and offers three engagement-history representations—text-based, multimodal, and semantic IDs—to enable realistic yet scalable simulations. The framework supports autonomous agents capable of perception, reasoning, action, and memory, optimized via RL with flexible reward signals and voice-driven feedback, and it extends naturally to multi-agent populations for collective dynamics. This setup enables safe, rapid experimentation with bold strategies, improves evaluation beyond offline metrics, and has strong industry interest as a stepping stone toward real-world instruction-aligned personalized information streams.

Abstract

We present RecoWorld, a blueprint for building simulated environments tailored to agentic recommender systems. Such environments give agents a proper training space where they can learn from errors without impacting real users. RecoWorld distinguishes itself with a dual-view architecture: a simulated user and an agentic recommender engage in multi-turn interactions aimed at maximizing user retention. The user simulator reviews recommended items, updates its mindset, and when sensing potential user disengagement, generates reflective instructions. The agentic recommender adapts its recommendations by incorporating these user instructions and reasoning traces, creating a dynamic feedback loop that actively engages users. This process leverages the exceptional reasoning capabilities of modern LLMs. We explore diverse content representations within the simulator, including text-based, multimodal, and semantic ID modeling, and discuss how multi-turn RL enables the recommender to refine its strategies through iterative interactions. RecoWorld also supports multi-agent simulations, allowing creators to simulate the responses of targeted user populations. It marks an important first step toward recommender systems where users and agents collaboratively shape personalized information streams. We envision new interaction paradigms where "user instructs, recommender responds," jointly optimizing user retention and engagement.

RecoWorld: Building Simulated Environments for Agentic Recommender Systems

TL;DR

RecoWorld provides a Gym-like simulated environment for training and evaluating instruction-following, agentic recommender systems by coupling an LLM-powered user simulator with an autonomous recommender in multi-turn sessions. It introduces dynamic memory and mindset-update mechanisms to model lifelong user behavior, and offers three engagement-history representations—text-based, multimodal, and semantic IDs—to enable realistic yet scalable simulations. The framework supports autonomous agents capable of perception, reasoning, action, and memory, optimized via RL with flexible reward signals and voice-driven feedback, and it extends naturally to multi-agent populations for collective dynamics. This setup enables safe, rapid experimentation with bold strategies, improves evaluation beyond offline metrics, and has strong industry interest as a stepping stone toward real-world instruction-aligned personalized information streams.

Abstract

We present RecoWorld, a blueprint for building simulated environments tailored to agentic recommender systems. Such environments give agents a proper training space where they can learn from errors without impacting real users. RecoWorld distinguishes itself with a dual-view architecture: a simulated user and an agentic recommender engage in multi-turn interactions aimed at maximizing user retention. The user simulator reviews recommended items, updates its mindset, and when sensing potential user disengagement, generates reflective instructions. The agentic recommender adapts its recommendations by incorporating these user instructions and reasoning traces, creating a dynamic feedback loop that actively engages users. This process leverages the exceptional reasoning capabilities of modern LLMs. We explore diverse content representations within the simulator, including text-based, multimodal, and semantic ID modeling, and discuss how multi-turn RL enables the recommender to refine its strategies through iterative interactions. RecoWorld also supports multi-agent simulations, allowing creators to simulate the responses of targeted user populations. It marks an important first step toward recommender systems where users and agents collaboratively shape personalized information streams. We envision new interaction paradigms where "user instructs, recommender responds," jointly optimizing user retention and engagement.

Paper Structure

This paper contains 9 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: A simulated user interacts with an agentic RecSys over multiple turns within a session.
  • Figure 2: Three modeling alternatives for engagement history that leverage LLMs' powerful reasoning capabilities (§\ref{['sec:user-sim']}).
  • Figure 3: An instruction-following recommender can be powered by an autonomous agent.