Table of Contents
Fetching ...

MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation

Xiaoyu Kong, Leheng Sheng, Junfei Tan, Yuxin Chen, Jiancan Wu, An Zhang, Xiang Wang, Xiangnan He

TL;DR

The paper investigates scaling generative recommender systems on public data by introducing MiniOneRec, the first fully open-source framework that combines SID construction via RQ-VAE, supervised fine-tuning, and reinforcement-learning–driven optimization. It demonstrates consistent performance gains with larger model sizes ($0.5$B to $7$B parameters) and introduces a lightweight post-training pipeline that enforces full-process SID alignment and reinforcement learning with constrained decoding and a hybrid reward design. Through extensive experiments on Amazon subsets, MiniOneRec outperforms traditional, generative, and several LLM-based baselines, while offering reduced context size and serving efficiency due to SID-based representations. The work also provides comprehensive ablations and transferability analyses, showing the importance of alignment, sampling strategy, and reward design, and highlights the value of pre-trained LLM initialization for practical performance gains. Overall, MiniOneRec establishes a solid, open platform for research and practice in scalable, generative recommendation with a reproducible workflow and clear post-training guidance.

Abstract

The recent success of large language models (LLMs) has renewed interest in whether recommender systems can achieve similar scaling benefits. Conventional recommenders, dominated by massive embedding tables, tend to plateau as embedding dimensions grow. In contrast, the emerging generative paradigm replaces embeddings with compact Semantic ID (SID) sequences produced by autoregressive Transformers. Yet most industrial deployments remain proprietary, leaving two fundamental questions open: (1) Do the expected scaling laws hold on public benchmarks? (2) What is the minimal post-training recipe that enables competitive performance? We present MiniOneRec, to the best of our knowledge, the first fully open-source generative recommendation framework, which provides an end-to-end workflow spanning SID construction, supervised fine-tuning, and recommendation-oriented reinforcement learning. We generate SIDs via a Residual Quantized VAE and post-train Qwen backbones ranging from 0.5B to 7B parameters on the Amazon Review dataset. Our experiments reveal a consistent downward trend in both training and evaluation losses with increasing model size, validating the parameter efficiency of the generative approach. To further enhance performance, we propose a lightweight yet effective post-training pipeline that (1) enforces full-process SID alignment and (2) applies reinforcement learning with constrained decoding and hybrid rewards. Together, these techniques yield significant improvements in both ranking accuracy and candidate diversity.

MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation

TL;DR

The paper investigates scaling generative recommender systems on public data by introducing MiniOneRec, the first fully open-source framework that combines SID construction via RQ-VAE, supervised fine-tuning, and reinforcement-learning–driven optimization. It demonstrates consistent performance gains with larger model sizes (B to B parameters) and introduces a lightweight post-training pipeline that enforces full-process SID alignment and reinforcement learning with constrained decoding and a hybrid reward design. Through extensive experiments on Amazon subsets, MiniOneRec outperforms traditional, generative, and several LLM-based baselines, while offering reduced context size and serving efficiency due to SID-based representations. The work also provides comprehensive ablations and transferability analyses, showing the importance of alignment, sampling strategy, and reward design, and highlights the value of pre-trained LLM initialization for practical performance gains. Overall, MiniOneRec establishes a solid, open platform for research and practice in scalable, generative recommendation with a reproducible workflow and clear post-training guidance.

Abstract

The recent success of large language models (LLMs) has renewed interest in whether recommender systems can achieve similar scaling benefits. Conventional recommenders, dominated by massive embedding tables, tend to plateau as embedding dimensions grow. In contrast, the emerging generative paradigm replaces embeddings with compact Semantic ID (SID) sequences produced by autoregressive Transformers. Yet most industrial deployments remain proprietary, leaving two fundamental questions open: (1) Do the expected scaling laws hold on public benchmarks? (2) What is the minimal post-training recipe that enables competitive performance? We present MiniOneRec, to the best of our knowledge, the first fully open-source generative recommendation framework, which provides an end-to-end workflow spanning SID construction, supervised fine-tuning, and recommendation-oriented reinforcement learning. We generate SIDs via a Residual Quantized VAE and post-train Qwen backbones ranging from 0.5B to 7B parameters on the Amazon Review dataset. Our experiments reveal a consistent downward trend in both training and evaluation losses with increasing model size, validating the parameter efficiency of the generative approach. To further enhance performance, we propose a lightweight yet effective post-training pipeline that (1) enforces full-process SID alignment and (2) applies reinforcement learning with constrained decoding and hybrid rewards. Together, these techniques yield significant improvements in both ranking accuracy and candidate diversity.

Paper Structure

This paper contains 28 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Left: Scaling curves from 0.5B to 7B parameters. Right: Effect of world knowledge on model performance: MiniOneRec-W/O ALIGN uses pretrained LLM weights but omits SID–text alignment, while MiniOneRec-Scratch is trained from random initialization and omits alignment.
  • Figure 2: MiniOneRec framework. RQ-VAE builds the item SID codebook. We then perform SFT to warm up the LLM and obtain an initial alignment. In RL, beam search with constrained decoding, thereby the model sequentially produces a ranked list of distinct, valid SIDs. GRPO updates the policy, and SID alignment is enforced end-to-end. This alignment objective is preserved throughout both the SFT and RL stages, fostering deeper semantic understanding.
  • Figure 3: Evaluation loss vs. SFT training epoch
  • Figure 4: Study on the effectiveness of MiniOneRec’s individual components. Figure \ref{['fig:ab1']} examines model performance under different alignment strategies; Figure \ref{['fig:ab2']} investigates various sampling strategies; Figure \ref{['fig:ab3']} evaluates the impact of alternative reward designs.