Table of Contents
Fetching ...

DualGR: Generative Retrieval with Long and Short-Term Interests Modeling

Zhongchao Yi, Kai Feng, Xiaojian Ma, Yalong Wang, Yongqi Liu, Han Li, Zhengyang Zhou, Yang Wang

TL;DR

DualGR tackles the challenge of scalable, high-quality retrieval in industrial generative systems by explicitly modeling both long-term and short-term user interests through a Dual-Branch Long/Short-Term Router (DBR). It further controls context-induced noise with Search-based SID Decoding (S2D) and accelerates fade-out of non-interest via Exposure-aware Next-Token Prediction Loss (ENTP-Loss), all within an encoder-free, decoder-centric framework that operates on hierarchical semantic IDs. Empirical results from Kuaishou's large-scale short-video platform show consistent improvements in retrieval quality and online business metrics, including modest gains in video views and watch time without latency penalties. These contributions offer a practical, deployable paradigm for industrial generative retrieval that balances accuracy, diversity, and computational efficiency.

Abstract

In large-scale industrial recommendation systems, retrieval must produce high-quality candidates from massive corpora under strict latency. Recently, Generative Retrieval (GR) has emerged as a viable alternative to Embedding-Based Retrieval (EBR), which quantizes items into a finite token space and decodes candidates autoregressively, providing a scalable path that explicitly models target-history interactions via cross-attention. However, three challenges persist: 1) how to balance users' long-term and short-term interests , 2) noise interference when generating hierarchical semantic IDs (SIDs), 3) the absence of explicit modeling for negative feedback such as exposed items without clicks. To address these challenges, we propose DualGR, a generative retrieval framework that explicitly models dual horizons of user interests with selective activation. Specifically, DualGR utilizes Dual-Branch Long/Short-Term Router (DBR) to cover both stable preferences and transient intents by explicitly modeling users' long- and short-term behaviors. Meanwhile, Search-based SID Decoding (S2D) is presented to control context-induced noise and enhance computational efficiency by constraining candidate interactions to the current coarse (level-1) bucket during fine-grained (level-2/3) SID prediction. % also reinforcing intra-class consistency. Finally, we propose an Exposure-aware Next-Token Prediction Loss (ENTP-Loss) that treats "exposed-but-unclicked" items as hard negatives at level-1, enabling timely interest fade-out. On the large-scale Kuaishou short-video recommendation system, DualGR has achieved outstanding performance. Online A/B testing shows +0.527% video views and +0.432% watch time lifts, validating DualGR as a practical and effective paradigm for industrial generative retrieval.

DualGR: Generative Retrieval with Long and Short-Term Interests Modeling

TL;DR

DualGR tackles the challenge of scalable, high-quality retrieval in industrial generative systems by explicitly modeling both long-term and short-term user interests through a Dual-Branch Long/Short-Term Router (DBR). It further controls context-induced noise with Search-based SID Decoding (S2D) and accelerates fade-out of non-interest via Exposure-aware Next-Token Prediction Loss (ENTP-Loss), all within an encoder-free, decoder-centric framework that operates on hierarchical semantic IDs. Empirical results from Kuaishou's large-scale short-video platform show consistent improvements in retrieval quality and online business metrics, including modest gains in video views and watch time without latency penalties. These contributions offer a practical, deployable paradigm for industrial generative retrieval that balances accuracy, diversity, and computational efficiency.

Abstract

In large-scale industrial recommendation systems, retrieval must produce high-quality candidates from massive corpora under strict latency. Recently, Generative Retrieval (GR) has emerged as a viable alternative to Embedding-Based Retrieval (EBR), which quantizes items into a finite token space and decodes candidates autoregressively, providing a scalable path that explicitly models target-history interactions via cross-attention. However, three challenges persist: 1) how to balance users' long-term and short-term interests , 2) noise interference when generating hierarchical semantic IDs (SIDs), 3) the absence of explicit modeling for negative feedback such as exposed items without clicks. To address these challenges, we propose DualGR, a generative retrieval framework that explicitly models dual horizons of user interests with selective activation. Specifically, DualGR utilizes Dual-Branch Long/Short-Term Router (DBR) to cover both stable preferences and transient intents by explicitly modeling users' long- and short-term behaviors. Meanwhile, Search-based SID Decoding (S2D) is presented to control context-induced noise and enhance computational efficiency by constraining candidate interactions to the current coarse (level-1) bucket during fine-grained (level-2/3) SID prediction. % also reinforcing intra-class consistency. Finally, we propose an Exposure-aware Next-Token Prediction Loss (ENTP-Loss) that treats "exposed-but-unclicked" items as hard negatives at level-1, enabling timely interest fade-out. On the large-scale Kuaishou short-video recommendation system, DualGR has achieved outstanding performance. Online A/B testing shows +0.527% video views and +0.432% watch time lifts, validating DualGR as a practical and effective paradigm for industrial generative retrieval.

Paper Structure

This paper contains 13 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the proposed DualGR.
  • Figure 2: Sensitivity analysis of DualGR.