OnePiece: The Great Route to Generative Recommendation -- A Case Study from Tencent Algorithm Competition
Jiangxia Cao, Shuo Yang, Zijun Wang, Qinghai Tan
TL;DR
The paper investigates scaling laws in generative recommender systems by unifying retrieval and generation within a single encoder–decoder backbone. It introduces a Semantic Tokenizer with Collaborative Residual K-means to produce SID codes and a cascade inference pipeline that combines SID beam-search with InfoNCE-based scoring, trained under a joint objective. Empirical results show both SID-based generative losses and embedding-based InfoNCE losses follow power-law scaling with high fit (R^2>0.9), with deeper architectures delivering stronger ranking signals. The work demonstrates a scalable, efficient approach for industrial-scale generative recommendations and highlights directions toward billion-parameter multi-modal backbones and end-to-end differentiable optimization.
Abstract
In past years, the OpenAI's Scaling-Laws shows the amazing intelligence with the next-token prediction paradigm in neural language modeling, which pointing out a free-lunch way to enhance the model performance by scaling the model parameters. In RecSys, the retrieval stage is also follows a 'next-token prediction' paradigm, to recall the hunderds of items from the global item set, thus the generative recommendation usually refers specifically to the retrieval stage (without Tree-based methods). This raises a philosophical question: without a ground-truth next item, does the generative recommendation also holds a potential scaling law? In retrospect, the generative recommendation has two different technique paradigms: (1) ANN-based framework, utilizing the compressed user embedding to retrieve nearest other items in embedding space, e.g, Kuaiformer. (2) Auto-regressive-based framework, employing the beam search to decode the item from whole space, e.g, OneRec. In this paper, we devise a unified encoder-decoder framework to validate their scaling-laws at same time. Our empirical finding is that both of their losses strictly adhere to power-law Scaling Laws ($R^2$>0.9) within our unified architecture.
