OneRec-Think: In-Text Reasoning for Generative Recommendation
Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, Yifei Hu, Qigen Hu, Xinchen Luo, Lejian Ren, Zixing Zhang, Qianqian Wang, Kuo Cai, Yunfan Wu, Hongtao Cheng, Zexuan Cheng, Lu Ren, Huanjie Wang, Yi Su, Ruiming Tang, Kun Gai, Guorui Zhou
TL;DR
OneRec-Think addresses the gap between generative recommendation and explicit reasoning by introducing an integrated framework that grounds item semantics in text, activates step-by-step reasoning within recommendations, and refines reasoning through reinforcement learning with a reasoning-aware reward. The three-stage Itemic Alignment, Reasoning Activation, and Reasoning Enhancement, paired with a Think-Ahead deployment architecture, yield state-of-the-art results on public benchmarks and measurable industrial gains (e.g., a 0.159% increase in APP Stay Time on Kuaishou). The approach produces interpretable rationales alongside high-quality recommendations and demonstrates robust transfer from closed benchmarks to real-world deployment, while outlining limitations of public datasets and future directions toward larger-scale, long-sequence evaluation. Overall, OneRec-Think makes explicit reasoning a core, deployable asset in generative recommendations, enabling more transparent and effective user experiences.
Abstract
The powerful generative capacity of Large Language Models (LLMs) has instigated a paradigm shift in recommendation. However, existing generative models (e.g., OneRec) operate as implicit predictors, critically lacking the capacity for explicit and controllable reasoning-a key advantage of LLMs. To bridge this gap, we propose OneRec-Think, a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation. OneRec-Think incorporates: (1) Itemic Alignment: cross-modal Item-Textual Alignment for semantic grounding; (2) Reasoning Activation: Reasoning Scaffolding to activate LLM reasoning within the recommendation context; and (3) Reasoning Enhancement, where we design a recommendation-specific reward function that accounts for the multi-validity nature of user preferences. Experiments across public benchmarks show state-of-the-art performance. Moreover, our proposed "Think-Ahead" architecture enables effective industrial deployment on Kuaishou, achieving a 0.159\% gain in APP Stay Time and validating the practical efficacy of the model's explicit reasoning capability.
