Table of Contents
Fetching ...

R$^2$ec: Towards Large Recommender Models with Reasoning

Runyang You, Yongqi Li, Xinyu Lin, Xin Zhang, Wenjie Wang, Wenjie Li, Liqiang Nie

TL;DR

This work addresses the fragmentation between reasoning and recommendation in large recommender systems by proposing R^2ec, a unified decoder-based model with dual heads for reasoning (lm_head) and item prediction (rec_head). It introduces RecPO, a reinforcement learning framework that jointly optimizes reasoning trajectories and final recommendations without requiring annotated rationales, using a fused reward that combines discrete ranking signals with continuous similarity measures. Empirical results across three Amazon domains show that R^2ec consistently surpasses traditional, LLM-based, and reasoning-augmented baselines, while maintaining competitive inference latency. The study also provides extensive analyses of optimization dynamics, reasoning behavior, and cross-domain robustness, highlighting the practical value of tightly coupling reasoning with recommendation for scalable, interpretable, and efficient recommender systems.

Abstract

Large recommender models have extended LLMs as powerful recommenders via encoding or item generation, and recent breakthroughs in LLM reasoning synchronously motivate the exploration of reasoning in recommendation. In this work, we propose R$^2$ec, a unified large recommender model with intrinsic reasoning capability. R$^2$ec introduces a dual-head architecture that supports both reasoning chain generation and efficient item prediction in a single model, significantly reducing inference latency. To overcome the lack of annotated reasoning data, we design RecPO, a reinforcement learning framework that optimizes reasoning and recommendation jointly with a novel fused reward mechanism. Extensive experiments on three datasets demonstrate that R$^2$ec outperforms traditional, LLM-based, and reasoning-augmented recommender baselines, while further analyses validate its competitive efficiency among conventional LLM-based recommender baselines and strong adaptability to diverse recommendation scenarios. Code and checkpoints available at https://github.com/YRYangang/RRec.

R$^2$ec: Towards Large Recommender Models with Reasoning

TL;DR

This work addresses the fragmentation between reasoning and recommendation in large recommender systems by proposing R^2ec, a unified decoder-based model with dual heads for reasoning (lm_head) and item prediction (rec_head). It introduces RecPO, a reinforcement learning framework that jointly optimizes reasoning trajectories and final recommendations without requiring annotated rationales, using a fused reward that combines discrete ranking signals with continuous similarity measures. Empirical results across three Amazon domains show that R^2ec consistently surpasses traditional, LLM-based, and reasoning-augmented baselines, while maintaining competitive inference latency. The study also provides extensive analyses of optimization dynamics, reasoning behavior, and cross-domain robustness, highlighting the practical value of tightly coupling reasoning with recommendation for scalable, interpretable, and efficient recommender systems.

Abstract

Large recommender models have extended LLMs as powerful recommenders via encoding or item generation, and recent breakthroughs in LLM reasoning synchronously motivate the exploration of reasoning in recommendation. In this work, we propose Rec, a unified large recommender model with intrinsic reasoning capability. Rec introduces a dual-head architecture that supports both reasoning chain generation and efficient item prediction in a single model, significantly reducing inference latency. To overcome the lack of annotated reasoning data, we design RecPO, a reinforcement learning framework that optimizes reasoning and recommendation jointly with a novel fused reward mechanism. Extensive experiments on three datasets demonstrate that Rec outperforms traditional, LLM-based, and reasoning-augmented recommender baselines, while further analyses validate its competitive efficiency among conventional LLM-based recommender baselines and strong adaptability to diverse recommendation scenarios. Code and checkpoints available at https://github.com/YRYangang/RRec.

Paper Structure

This paper contains 54 sections, 15 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: The architecture overview of $\text{R}^2\text{ec}$, which facilitates interleaved reasoning and recommendation in an autoregressive process with two task-specific heads: 1) language-modeling head (lm_head) for reasoning generation; and 2) recommendation head (rec_head) for item prediction.
  • Figure 2: Analysis on advantage estimation methods, RLOO and GRPO, across two datasets. "Train Reward" and "Val Reward" indicate the variation in rewards on the training set and validation set, respectively. "Val Length" represents the variation in reasoning length on the validation set.
  • Figure 3: Analysis on trajectory sampling and group size over the CDs dataset. (\ref{['subfig:temp']}) and (\ref{['subfig:topk']}) show the impact of temperature and top-$K$ sampling on performance and reasoning length, respectively. (\ref{['subfig:group size n']}) and (\ref{['subfig:group size h']}) present the effect of group size on NDCG@5 and Hit Rate@5, respectively.
  • Figure 4: Distribution of reasoning behaviors across datasets. Each bar represents the proportion of reasoning outputs exhibiting a given reasoning behavior within a dataset.
  • Figure 5: Prompt templates for user interaction history and item metadata. The User Prompt encodes a user's past purchases as a sequence of item titles, relative timestamps (e.g., “2hrs”, “4d”), and explicit ratings (in $[1,5]$), followed by an instruction to analyze and recommend the next item within the span of </answer>and </answer>. The Item Prompt summarizes structured item attributes (e.g., brand, type, features) with the same format requirement.
  • ...and 5 more figures