Table of Contents
Fetching ...

REG4Rec: Reasoning-Enhanced Generative Model for Large-Scale Recommendation Systems

Haibo Xing, Hao Deng, Yucheng Mao, Lingyu Mu, Jinxin Hu, Yi Xu, Hao Zhang, Jiahao Wang, Shizhun Wang, Yu Zhang, Xiaoyi Zeng, Jing Zhang

TL;DR

REG4Rec tackles the challenge of diverse and reliable reasoning in generative sequential recommendation by introducing a Mixture-of-Experts based Parallel Quantization Codebook (MPQ) to create a large, dynamic reasoning space. It couples MPQ with Confidence-based Reasoning Step Selection (CRSS) and a two-stage training framework (PARS and MSRA), augmented by Consistency-Oriented Self-Reflection Pruning (CORP) at inference, to yield flexible yet reliable reasoning paths. Extensive offline evaluation across four datasets shows state-of-the-art performance with substantial gains, and online deployment in a commercial advertising system demonstrates practical impact on revenue, CTR, and GMV. Overall, REG4Rec provides a scalable, reliability-aware reinforcement mechanism for generative recommendations in large-scale settings, with strong potential for real-world adoption and further research into adaptive rewards and explainability.

Abstract

Sequential recommendation aims to predict a user's next action in large-scale recommender systems. While traditional methods often suffer from insufficient information interaction, recent generative recommendation models partially address this issue by directly generating item predictions. To better capture user intents, recent studies have introduced a reasoning process into generative recommendation, significantly improving recommendation performance. However, these approaches are constrained by the singularity of item semantic representations, facing challenges such as limited diversity in reasoning pathways and insufficient reliability in the reasoning process. To tackle these issues, we introduce REG4Rec, a reasoning-enhanced generative model that constructs multiple dynamic semantic reasoning paths alongside a self-reflection process, ensuring high-confidence recommendations. Specifically, REG4Rec utilizes an MoE-based parallel quantization codebook (MPQ) to generate multiple unordered semantic tokens for each item, thereby constructing a larger-scale diverse reasoning space. Furthermore, to enhance the reliability of reasoning, we propose a training reasoning enhancement stage, which includes Preference Alignment for Reasoning (PARS) and a Multi-Step Reward Augmentation (MSRA) strategy. PARS uses reward functions tailored for recommendation to enhance reasoning and reflection, while MSRA introduces future multi-step actions to improve overall generalization. During inference, Consistency-Oriented Self-Reflection for Pruning (CORP) is proposed to discard inconsistent reasoning paths, preventing the propagation of erroneous reasoning. Lastly, we develop an efficient offline training strategy for large-scale recommendation. Experiments on real-world datasets and online evaluations show that REG4Rec delivers outstanding performance and substantial practical value.

REG4Rec: Reasoning-Enhanced Generative Model for Large-Scale Recommendation Systems

TL;DR

REG4Rec tackles the challenge of diverse and reliable reasoning in generative sequential recommendation by introducing a Mixture-of-Experts based Parallel Quantization Codebook (MPQ) to create a large, dynamic reasoning space. It couples MPQ with Confidence-based Reasoning Step Selection (CRSS) and a two-stage training framework (PARS and MSRA), augmented by Consistency-Oriented Self-Reflection Pruning (CORP) at inference, to yield flexible yet reliable reasoning paths. Extensive offline evaluation across four datasets shows state-of-the-art performance with substantial gains, and online deployment in a commercial advertising system demonstrates practical impact on revenue, CTR, and GMV. Overall, REG4Rec provides a scalable, reliability-aware reinforcement mechanism for generative recommendations in large-scale settings, with strong potential for real-world adoption and further research into adaptive rewards and explainability.

Abstract

Sequential recommendation aims to predict a user's next action in large-scale recommender systems. While traditional methods often suffer from insufficient information interaction, recent generative recommendation models partially address this issue by directly generating item predictions. To better capture user intents, recent studies have introduced a reasoning process into generative recommendation, significantly improving recommendation performance. However, these approaches are constrained by the singularity of item semantic representations, facing challenges such as limited diversity in reasoning pathways and insufficient reliability in the reasoning process. To tackle these issues, we introduce REG4Rec, a reasoning-enhanced generative model that constructs multiple dynamic semantic reasoning paths alongside a self-reflection process, ensuring high-confidence recommendations. Specifically, REG4Rec utilizes an MoE-based parallel quantization codebook (MPQ) to generate multiple unordered semantic tokens for each item, thereby constructing a larger-scale diverse reasoning space. Furthermore, to enhance the reliability of reasoning, we propose a training reasoning enhancement stage, which includes Preference Alignment for Reasoning (PARS) and a Multi-Step Reward Augmentation (MSRA) strategy. PARS uses reward functions tailored for recommendation to enhance reasoning and reflection, while MSRA introduces future multi-step actions to improve overall generalization. During inference, Consistency-Oriented Self-Reflection for Pruning (CORP) is proposed to discard inconsistent reasoning paths, preventing the propagation of erroneous reasoning. Lastly, we develop an efficient offline training strategy for large-scale recommendation. Experiments on real-world datasets and online evaluations show that REG4Rec delivers outstanding performance and substantial practical value.

Paper Structure

This paper contains 25 sections, 16 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Comparison of reasoning in GR: (a) GR with a limited reasoning process; (b) our REG4Rec enables diverse and reliable reasoning process to better capture the variety of users' intents.
  • Figure 2: REG4Rec introduces MPQ for generating flexible and reliable reasoning paths. During training, CRSS diversifies path combinations, while PARS selects consistent, confident paths. MSRA extends the reward horizon to capture long-term preferences and reduce noise caused by stochastic behavior. During inference, CORP prunes inconsistent paths, preventing the propagation of errors in the reasoning process.
  • Figure 3: Sensitivity Analysis
  • Figure 4: The performance of REG4Rec across different numbers of reasoning steps.