Table of Contents
Fetching ...

ChainRec: An Agentic Recommender Learning to Route Tool Chains for Diverse and Evolving Interests

Fuchun Li, Qian Li, Xingyu Gao, Bocheng Pan, Yang Wu, Jun Zhang, Huan Yu, Jie Jiang, Jinsheng Xiao, Hailong Shi

TL;DR

ChainRec addresses the brittleness of fixed recommendation pipelines by introducing an agentic recommender that dynamically routes among a standardized library of evidence-gathering tools. A two-stage learning process (SFT to train the Planner, followed by Direct Preference Optimization to align tool routing) enables instance-specific, budget-bounded evidence collection. Tool construction from expert CoT traces yields a modular Tool Agent Library that decouples capability from policy, enabling robust planning across domains. Experiments on AgentRecBench across Amazon, Goodreads, and Yelp show consistent Avg HR@1/3/5 improvements, especially in cold-start and evolving-interest scenarios, demonstrating strong adaptability and practical impact for interactive recommendations.

Abstract

Large language models (LLMs) are increasingly integrated into recommender systems, motivating recent interest in agentic and reasoning-based recommendation. However, most existing approaches still rely on fixed workflows, applying the same reasoning procedure across diverse recommendation scenarios. In practice, user contexts vary substantially-for example, in cold-start settings or during interest shifts, so an agent should adaptively decide what evidence to gather next rather than following a scripted process. To address this, we propose ChainRec, an agentic recommender that uses a planner to dynamically select reasoning tools. ChainRec builds a standardized Tool Agent Library from expert trajectories. It then trains a planner using supervised fine-tuning and preference optimization to dynamically select tools, decide their order, and determine when to stop. Experiments on AgentRecBench across Amazon, Yelp, and Goodreads show that ChainRec consistently improves Avg HR@{1,3,5} over strong baselines, with especially notable gains in cold-start and evolving-interest scenarios. Ablation studies further validate the importance of tool standardization and preference-optimized planning.

ChainRec: An Agentic Recommender Learning to Route Tool Chains for Diverse and Evolving Interests

TL;DR

ChainRec addresses the brittleness of fixed recommendation pipelines by introducing an agentic recommender that dynamically routes among a standardized library of evidence-gathering tools. A two-stage learning process (SFT to train the Planner, followed by Direct Preference Optimization to align tool routing) enables instance-specific, budget-bounded evidence collection. Tool construction from expert CoT traces yields a modular Tool Agent Library that decouples capability from policy, enabling robust planning across domains. Experiments on AgentRecBench across Amazon, Goodreads, and Yelp show consistent Avg HR@1/3/5 improvements, especially in cold-start and evolving-interest scenarios, demonstrating strong adaptability and practical impact for interactive recommendations.

Abstract

Large language models (LLMs) are increasingly integrated into recommender systems, motivating recent interest in agentic and reasoning-based recommendation. However, most existing approaches still rely on fixed workflows, applying the same reasoning procedure across diverse recommendation scenarios. In practice, user contexts vary substantially-for example, in cold-start settings or during interest shifts, so an agent should adaptively decide what evidence to gather next rather than following a scripted process. To address this, we propose ChainRec, an agentic recommender that uses a planner to dynamically select reasoning tools. ChainRec builds a standardized Tool Agent Library from expert trajectories. It then trains a planner using supervised fine-tuning and preference optimization to dynamically select tools, decide their order, and determine when to stop. Experiments on AgentRecBench across Amazon, Yelp, and Goodreads show that ChainRec consistently improves Avg HR@{1,3,5} over strong baselines, with especially notable gains in cold-start and evolving-interest scenarios. Ablation studies further validate the importance of tool standardization and preference-optimized planning.
Paper Structure (41 sections, 11 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 41 sections, 11 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Scenario-dependent CoT under the same prompt. Three samples yield distinct reasoning routes yet valid rankings, motivating dynamic planning that decomposes steps and learns to route tools by state.
  • Figure 2: t-SNE of full CoT embeddings under the same prompt. Colors show clusters; Sample 1–3 mark three runs from different scenarios, illustrating scenario-induced CoT differences.
  • Figure 3: ChainRec offline--online workflow. Offline: capability construction by mining expert CoT traces and encapsulating them into tools; planner training with SFT $\rightarrow$ DPO. Online: the Planner observes the current state in Memory, selects a tool from the Tool Agent Library, executes it to obtain evidence, updates Memory, and repeats until CandidateRank outputs the final ranking.
  • Figure 4: Prompt used to construct Chain-of-Thought (CoT) for recommendation. The template specifies the input (user interaction history and 20 candidates), requirements (step-by-step reasoning, CoT step labels, and a full ranking), and the structured output format.