RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation
Kairui Fu, Changfa Wu, Kun Yuan, Binbin Cao, Dunxian Huang, Yuliang Yan, Junjun Zheng, Jianning Zhang, Silu Zhou, Jian Wu, Kun Kuang
TL;DR
RankGR tackles the limitations of next-token prediction in generative retrieval by proposing a two-phase framework that explicitly models hierarchical user preferences (Initial Assessment Phase) and performs deep candidate–sequence interaction (Refined Scoring Phase). The LDPO objective in IAP captures multi-level user feedback (purchase, click, exposure, pseudo-exposure) in a listwise fashion, while RSP provides a lightweight yet expressive refinement through candidate-centric attention. Together with asynchronous pre-computation, streaming updates, and caching, RankGR achieves strong offline performance on large datasets and credible online gains in Taobao, demonstrating scalable, real-time generative retrieval. The work offers practical guidance for deploying GR systems in industrial settings and highlights the importance of modeling partial order and item–sequence interactions for improved recommendation quality.
Abstract
Generative retrieval (GR) has emerged as a promising paradigm in recommendation systems by autoregressively decoding identifiers of target items. Despite its potential, current approaches typically rely on the next-token prediction schema, which treats each token of the next interacted items as the sole target. This narrow focus 1) limits their ability to capture the nuanced structure of user preferences, and 2) overlooks the deep interaction between decoded identifiers and user behavior sequences. In response to these challenges, we propose RankGR, a Rank-enhanced Generative Retrieval method that incorporates listwise direct preference optimization for recommendation. RankGR decomposes the retrieval process into two complementary stages: the Initial Assessment Phase (IAP) and the Refined Scoring Phase (RSP). In IAP, we incorporate a novel listwise direct preference optimization strategy into GR, thus facilitating a more comprehensive understanding of the hierarchical user preferences and more effective partial-order modeling. The RSP then refines the top-λ candidates generated by IAP with interactions towards input sequences using a lightweight scoring module, leading to more precise candidate evaluation. Both phases are jointly optimized under a unified GR model, ensuring consistency and efficiency. Additionally, we implement several practical improvements in training and deployment, ultimately achieving a real-time system capable of handling nearly ten thousand requests per second. Extensive offline performance on both research and industrial datasets, as well as the online gains on the "Guess You Like" section of Taobao, validate the effectiveness and scalability of RankGR.
