Table of Contents
Fetching ...

Interactive Recommendation Agent with Active User Commands

Jiakai Tang, Yujie Luo, Xunke Xi, Fei Sun, Xueyang Feng, Sunhao Dai, Chao Yi, Dian Chen, Zhujin Gao, Yang Li, Xu Chen, Wen Chen, Jian Wu, Yuning Jiang, Bo Zheng

TL;DR

This work identifies a gap in traditional recommender systems caused by passive feedback and proposes an Interactive Recommendation Feed (IRF) where users issue natural language commands to actively steer recommendations. It introduces RecBot, a dual-agent framework (Parser and Planner) plus a modular toolset and a simulation-augmented knowledge distillation pipeline to enable real-time, command-aware policy adaptation with production viability. The approach achieves state-of-the-art offline performance and strong long-term online gains across multiple datasets, demonstrating improvements in user satisfaction (reduced NFF) and business metrics (GMV, ATC, PV) while enhancing content diversity. The findings illustrate the practical value of direct user-system communication in recommender systems and point to scalable deployment via knowledge distillation and modular tool orchestration for complex, multimodal preferences.

Abstract

Traditional recommender systems rely on passive feedback mechanisms that limit users to simple choices such as like and dislike. However, these coarse-grained signals fail to capture users' nuanced behavior motivations and intentions. In turn, current systems cannot also distinguish which specific item attributes drive user satisfaction or dissatisfaction, resulting in inaccurate preference modeling. These fundamental limitations create a persistent gap between user intentions and system interpretations, ultimately undermining user satisfaction and harming system effectiveness. To address these limitations, we introduce the Interactive Recommendation Feed (IRF), a pioneering paradigm that enables natural language commands within mainstream recommendation feeds. Unlike traditional systems that confine users to passive implicit behavioral influence, IRF empowers active explicit control over recommendation policies through real-time linguistic commands. To support this paradigm, we develop RecBot, a dual-agent architecture where a Parser Agent transforms linguistic expressions into structured preferences and a Planner Agent dynamically orchestrates adaptive tool chains for on-the-fly policy adjustment. To enable practical deployment, we employ simulation-augmented knowledge distillation to achieve efficient performance while maintaining strong reasoning capabilities. Through extensive offline and long-term online experiments, RecBot shows significant improvements in both user satisfaction and business outcomes.

Interactive Recommendation Agent with Active User Commands

TL;DR

This work identifies a gap in traditional recommender systems caused by passive feedback and proposes an Interactive Recommendation Feed (IRF) where users issue natural language commands to actively steer recommendations. It introduces RecBot, a dual-agent framework (Parser and Planner) plus a modular toolset and a simulation-augmented knowledge distillation pipeline to enable real-time, command-aware policy adaptation with production viability. The approach achieves state-of-the-art offline performance and strong long-term online gains across multiple datasets, demonstrating improvements in user satisfaction (reduced NFF) and business metrics (GMV, ATC, PV) while enhancing content diversity. The findings illustrate the practical value of direct user-system communication in recommender systems and point to scalable deployment via knowledge distillation and modular tool orchestration for complex, multimodal preferences.

Abstract

Traditional recommender systems rely on passive feedback mechanisms that limit users to simple choices such as like and dislike. However, these coarse-grained signals fail to capture users' nuanced behavior motivations and intentions. In turn, current systems cannot also distinguish which specific item attributes drive user satisfaction or dissatisfaction, resulting in inaccurate preference modeling. These fundamental limitations create a persistent gap between user intentions and system interpretations, ultimately undermining user satisfaction and harming system effectiveness. To address these limitations, we introduce the Interactive Recommendation Feed (IRF), a pioneering paradigm that enables natural language commands within mainstream recommendation feeds. Unlike traditional systems that confine users to passive implicit behavioral influence, IRF empowers active explicit control over recommendation policies through real-time linguistic commands. To support this paradigm, we develop RecBot, a dual-agent architecture where a Parser Agent transforms linguistic expressions into structured preferences and a Planner Agent dynamically orchestrates adaptive tool chains for on-the-fly policy adjustment. To enable practical deployment, we employ simulation-augmented knowledge distillation to achieve efficient performance while maintaining strong reasoning capabilities. Through extensive offline and long-term online experiments, RecBot shows significant improvements in both user satisfaction and business outcomes.

Paper Structure

This paper contains 43 sections, 18 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Comparison between traditional and novel interactive recommendation feeds. (a) Traditional systems rely on constrained and implicit feedback signals (e.g., likes/dislikes), making it difficult to accurately infer users' true intentions. (b) Our interactive paradigm enables free-form natural language commands, where RecBot responds and adjusts recommendation policy on-the-fly based on active user commands.
  • Figure 2: Overview of the RecBot framework for interactive recommendation. The framework comprises a Parser Agent that transforms user natural language command $c_t$ into structured preferences $P_{t+1}$, and a Planner Agent that orchestrates tool chains to dynamically adjust recommendation policies and generate the next feed $R_{t+1}$.
  • Figure 3: Illustration of the Parser for user intent understanding. The Parser integrates history preference memory $P_t$, current recommendation feed $R_t$, and active user command $c_t$ to generate new preference representation $P_{t+1}$ through structured parsing and dynamic memory consolidation.
  • Figure 4: Illustration of the Planner for on-the-fly recommendation policy adaptation. The Planner dynamically constructs optimal tool invocation sequences based on parsed user preferences $P_{t+1}$ to compute updated item scores $s_{\mathrm{final}}$ for next recommendation feed $R_{t+1}$.
  • Figure 5: Offline ablation study results on Amazon dataset. All numerical values on axes correspond to percentages (percentage notation is omitted for conciseness).
  • ...and 3 more figures