GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token Prediction
Zhijie Lin, Zhuofeng Li, Chenglei Dai, Wentian Bao, Shuai Lin, Enyun Yu, Haoxiang Zhang, Liang Zhao
TL;DR
Reranking in multi-stage recommender systems is challenging due to the combinatorial permutation space and inefficiencies in end-to-end training with traditional generator–evaluator setups. The authors introduce GReF, a Unified Generative Efficient Reranking Framework that uses Gen-Reranker (a bidirectional encoder plus a dynamic autoregressive decoder) trained through exposure-order pre-training and Rerank-DPO post-training, plus Ordered Multi-token Prediction to accelerate inference. Empirical results show GReF outperforms state-of-the-art baselines on public and industrial data, with latency close to non-autoregressive models and successful deployment at Kuaishou, where online metrics improve across views, engagement, and shares. The work demonstrates that end-to-end generative reranking can achieve both high effectiveness and real-time deployability in large-scale recommender systems.
Abstract
In a multi-stage recommendation system, reranking plays a crucial role in modeling intra-list correlations among items. A key challenge lies in exploring optimal sequences within the combinatorial space of permutations. Recent research follows a two-stage (generator-evaluator) paradigm, where a generator produces multiple feasible sequences, and an evaluator selects the best one. In practice, the generator is typically implemented as an autoregressive model. However, these two-stage methods face two main challenges. First, the separation of the generator and evaluator hinders end-to-end training. Second, autoregressive generators suffer from inference efficiency. In this work, we propose a Unified Generative Efficient Reranking Framework (GReF) to address the two primary challenges. Specifically, we introduce Gen-Reranker, an autoregressive generator featuring a bidirectional encoder and a dynamic autoregressive decoder to generate causal reranking sequences. Subsequently, we pre-train Gen-Reranker on the item exposure order for high-quality parameter initialization. To eliminate the need for the evaluator while integrating sequence-level evaluation during training for end-to-end optimization, we propose post-training the model through Rerank-DPO. Moreover, for efficient autoregressive inference, we introduce ordered multi-token prediction (OMTP), which trains Gen-Reranker to simultaneously generate multiple future items while preserving their order, ensuring practical deployment in real-time recommender systems. Extensive offline experiments demonstrate that GReF outperforms state-of-the-art reranking methods while achieving latency that is nearly comparable to non-autoregressive models. Additionally, GReF has also been deployed in a real-world video app Kuaishou with over 300 million daily active users, significantly improving online recommendation quality.
