RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation

Kairui Fu; Changfa Wu; Kun Yuan; Binbin Cao; Dunxian Huang; Yuliang Yan; Junjun Zheng; Jianning Zhang; Silu Zhou; Jian Wu; Kun Kuang

RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation

Kairui Fu, Changfa Wu, Kun Yuan, Binbin Cao, Dunxian Huang, Yuliang Yan, Junjun Zheng, Jianning Zhang, Silu Zhou, Jian Wu, Kun Kuang

TL;DR

RankGR tackles the limitations of next-token prediction in generative retrieval by proposing a two-phase framework that explicitly models hierarchical user preferences (Initial Assessment Phase) and performs deep candidate–sequence interaction (Refined Scoring Phase). The LDPO objective in IAP captures multi-level user feedback (purchase, click, exposure, pseudo-exposure) in a listwise fashion, while RSP provides a lightweight yet expressive refinement through candidate-centric attention. Together with asynchronous pre-computation, streaming updates, and caching, RankGR achieves strong offline performance on large datasets and credible online gains in Taobao, demonstrating scalable, real-time generative retrieval. The work offers practical guidance for deploying GR systems in industrial settings and highlights the importance of modeling partial order and item–sequence interactions for improved recommendation quality.

Abstract

Generative retrieval (GR) has emerged as a promising paradigm in recommendation systems by autoregressively decoding identifiers of target items. Despite its potential, current approaches typically rely on the next-token prediction schema, which treats each token of the next interacted items as the sole target. This narrow focus 1) limits their ability to capture the nuanced structure of user preferences, and 2) overlooks the deep interaction between decoded identifiers and user behavior sequences. In response to these challenges, we propose RankGR, a Rank-enhanced Generative Retrieval method that incorporates listwise direct preference optimization for recommendation. RankGR decomposes the retrieval process into two complementary stages: the Initial Assessment Phase (IAP) and the Refined Scoring Phase (RSP). In IAP, we incorporate a novel listwise direct preference optimization strategy into GR, thus facilitating a more comprehensive understanding of the hierarchical user preferences and more effective partial-order modeling. The RSP then refines the top-λ candidates generated by IAP with interactions towards input sequences using a lightweight scoring module, leading to more precise candidate evaluation. Both phases are jointly optimized under a unified GR model, ensuring consistency and efficiency. Additionally, we implement several practical improvements in training and deployment, ultimately achieving a real-time system capable of handling nearly ten thousand requests per second. Extensive offline performance on both research and industrial datasets, as well as the online gains on the "Guess You Like" section of Taobao, validate the effectiveness and scalability of RankGR.

RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation

TL;DR

Abstract

Paper Structure (29 sections, 19 equations, 5 figures, 5 tables)

This paper contains 29 sections, 19 equations, 5 figures, 5 tables.

Introduction
Related Work
Generative Retrieval
Preference Alignment
Methodology
Problem Formulation
Initial Assesment Phase
Item level modeling.
Listwise direct preference optimization.
Refined Scoring Phase
Candidate-Centric Interaction
Inference Procedure
System Deployment
Asynchronous Pre-computation Architecture
Real-time Retrieval and Serving
...and 14 more sections

Figures (5)

Figure 1: (a) Multi-stage architecture in modern recommender systems. (b) A brief diagram of generative retrieval.
Figure 2: Overview of the training of RankGR. (a) The generation process of semantic identifiers. (b) The initial assessment phase aims to capture both the sequential pattern and the partial order relations. (c) The refined scoring phase to provide a more accurate prediction of those candidates from the initial assesment.
Figure 3: Detailed deployment process of RankGR.
Figure 4: Performance of RankGR in terms of HR@1000 with respect to the number of retaining codewords for RSP.
Figure 5: Performance of RankGR in terms of HR@1000 on the Taobao dataset with respect to the hyperparameter $\alpha$.

RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation

TL;DR

Abstract

RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)