Deep Research for Recommender Systems

Kesha Ou; Chenghao Wu; Xiaolei Wang; Bowen Zheng; Wayne Xin Zhao; Weitao Li; Long Zhang; Sheng Chen; Ji-Rong Wen

Deep Research for Recommender Systems

Kesha Ou, Chenghao Wu, Xiaolei Wang, Bowen Zheng, Wayne Xin Zhao, Weitao Li, Long Zhang, Sheng Chen, Ji-Rong Wen

TL;DR

This paper argues that this traditional tool-based paradigm for recommendation fundamentally limits user experience, as the system acts as a passive filter rather than an active assistant, and proposes a novel deep research paradigm for recommendation, which replaces conventional item lists with comprehensive, user-centric reports.

Abstract

The technical foundations of recommender systems have progressed from collaborative filtering to complex neural models and, more recently, large language models. Despite these technological advances, deployed systems often underserve their users by simply presenting a list of items, leaving the burden of exploration, comparison, and synthesis entirely on the user. This paper argues that this traditional "tool-based" paradigm fundamentally limits user experience, as the system acts as a passive filter rather than an active assistant. To address this limitation, we propose a novel deep research paradigm for recommendation, which replaces conventional item lists with comprehensive, user-centric reports. We instantiate this paradigm through RecPilot, a multi-agent framework comprising two core components: a user trajectory simulation agent that autonomously explores the item space, and a self-evolving report generation agent that synthesizes the findings into a coherent, interpretable report tailored to support user decisions. This approach reframes recommendation as a proactive, agent-driven service. Extensive experiments on public datasets demonstrate that RecPilot not only achieves strong performance in modeling user behaviors but also generates highly persuasive reports that substantially reduce user effort in item evaluation, validating the potential of this new interaction paradigm.

Deep Research for Recommender Systems

TL;DR

Abstract

Paper Structure (23 sections, 12 equations, 5 figures, 5 tables)

This paper contains 23 sections, 12 equations, 5 figures, 5 tables.

Introduction
Approach
Overview of the Approach
User Trajectory Simulation Agent
Generative User Trajectory Learning
Reinforcement Learning with Model-Free Process Rewards
Exploration Trajectory Generation
Self-Evolving Report Generation Agent
Agentic Report Generation
Self-Evolution for Personalization
Experiment
Experimental Setup
Evaluation on Trajectory Simulation Task
Evaluation on Report Generation Task
Related Work
...and 8 more sections

Figures (5)

Figure 1: The overview of our approach RecPilot. The left part demonstrates the overall pipeline. The trajectory simulation agent receives contextual information and historical interactions as input, and generate a simulated trajectory for the report generation agent to generate the final report. The upper-right part introduces the trajectory agent, which interleaves item and behavior generation and is optimized via model-free reward reinforcement learning. The lower-right part illustrates the report generation agent, which first decomposes user interests into multiple aspects for ranking before writing reports. It characterizes user preferences through rubrics and experience, and achieves self-evolution along these two dimensions for optimization.
Figure 2: Performance w.r.t. maximum trajectory lengths on the Tmall dataset.
Figure 3: Performance comparison w.r.t. two sampling parameters on the Tmall dataset: threshold $p$ and temperature $\tau$.
Figure 4: Detailed analysis on the Tmall dataset about the report generation task.
Figure 5: Sample interaction interfaces in existing recommender systems and our proposed RecPilot.

Deep Research for Recommender Systems

TL;DR

Abstract

Deep Research for Recommender Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)