Table of Contents
Fetching ...

OneRec-Think: In-Text Reasoning for Generative Recommendation

Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, Yifei Hu, Qigen Hu, Xinchen Luo, Lejian Ren, Zixing Zhang, Qianqian Wang, Kuo Cai, Yunfan Wu, Hongtao Cheng, Zexuan Cheng, Lu Ren, Huanjie Wang, Yi Su, Ruiming Tang, Kun Gai, Guorui Zhou

TL;DR

OneRec-Think addresses the gap between generative recommendation and explicit reasoning by introducing an integrated framework that grounds item semantics in text, activates step-by-step reasoning within recommendations, and refines reasoning through reinforcement learning with a reasoning-aware reward. The three-stage Itemic Alignment, Reasoning Activation, and Reasoning Enhancement, paired with a Think-Ahead deployment architecture, yield state-of-the-art results on public benchmarks and measurable industrial gains (e.g., a 0.159% increase in APP Stay Time on Kuaishou). The approach produces interpretable rationales alongside high-quality recommendations and demonstrates robust transfer from closed benchmarks to real-world deployment, while outlining limitations of public datasets and future directions toward larger-scale, long-sequence evaluation. Overall, OneRec-Think makes explicit reasoning a core, deployable asset in generative recommendations, enabling more transparent and effective user experiences.

Abstract

The powerful generative capacity of Large Language Models (LLMs) has instigated a paradigm shift in recommendation. However, existing generative models (e.g., OneRec) operate as implicit predictors, critically lacking the capacity for explicit and controllable reasoning-a key advantage of LLMs. To bridge this gap, we propose OneRec-Think, a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation. OneRec-Think incorporates: (1) Itemic Alignment: cross-modal Item-Textual Alignment for semantic grounding; (2) Reasoning Activation: Reasoning Scaffolding to activate LLM reasoning within the recommendation context; and (3) Reasoning Enhancement, where we design a recommendation-specific reward function that accounts for the multi-validity nature of user preferences. Experiments across public benchmarks show state-of-the-art performance. Moreover, our proposed "Think-Ahead" architecture enables effective industrial deployment on Kuaishou, achieving a 0.159\% gain in APP Stay Time and validating the practical efficacy of the model's explicit reasoning capability.

OneRec-Think: In-Text Reasoning for Generative Recommendation

TL;DR

OneRec-Think addresses the gap between generative recommendation and explicit reasoning by introducing an integrated framework that grounds item semantics in text, activates step-by-step reasoning within recommendations, and refines reasoning through reinforcement learning with a reasoning-aware reward. The three-stage Itemic Alignment, Reasoning Activation, and Reasoning Enhancement, paired with a Think-Ahead deployment architecture, yield state-of-the-art results on public benchmarks and measurable industrial gains (e.g., a 0.159% increase in APP Stay Time on Kuaishou). The approach produces interpretable rationales alongside high-quality recommendations and demonstrates robust transfer from closed benchmarks to real-world deployment, while outlining limitations of public datasets and future directions toward larger-scale, long-sequence evaluation. Overall, OneRec-Think makes explicit reasoning a core, deployable asset in generative recommendations, enabling more transparent and effective user experiences.

Abstract

The powerful generative capacity of Large Language Models (LLMs) has instigated a paradigm shift in recommendation. However, existing generative models (e.g., OneRec) operate as implicit predictors, critically lacking the capacity for explicit and controllable reasoning-a key advantage of LLMs. To bridge this gap, we propose OneRec-Think, a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation. OneRec-Think incorporates: (1) Itemic Alignment: cross-modal Item-Textual Alignment for semantic grounding; (2) Reasoning Activation: Reasoning Scaffolding to activate LLM reasoning within the recommendation context; and (3) Reasoning Enhancement, where we design a recommendation-specific reward function that accounts for the multi-validity nature of user preferences. Experiments across public benchmarks show state-of-the-art performance. Moreover, our proposed "Think-Ahead" architecture enables effective industrial deployment on Kuaishou, achieving a 0.159\% gain in APP Stay Time and validating the practical efficacy of the model's explicit reasoning capability.

Paper Structure

This paper contains 46 sections, 11 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Examples of OneRec-Think's Unified Dialogue, Reasoning and Recommendation Framework.
  • Figure 2: The framework of the OneRec-Think. In the first stage, we achieve item-level semantic alignment through multi-task pre-training. In the second stage, we activate explicit reasoning by prompting the model to generate preference rationales. In the third stage, we refine the reasoning paths through RL based on a reward tailored for recommendations.
  • Figure 3: Demonstration of context-aware recommendation adaptation: our model dynamically shifts recommendations to relaxing content based on the user's command.
  • Figure 4: Demonstration of fine-grained interest reasoning, which shows the end-to-end process from user behavior analysis to interpretable recommendations.
  • Figure 5: The model's reasoning process evolves from broad interest matching (left) to fine-grained theme specification (middle), with recommendations (right) showing semantic consistency with each reasoning step.
  • ...and 2 more figures