Table of Contents
Fetching ...

Unifying Generative and Dense Retrieval for Sequential Recommendation

Liu Yang, Fabian Paischer, Kaveh Hassani, Jiacheng Li, Shuai Shao, Zhang Gabriel Li, Yun He, Xue Feng, Nima Noorshams, Sem Park, Bo Long, Robert D Nowak, Xiaoli Gao, Hamid Eghbalzadeh

TL;DR

The paper addresses the trade-offs between generative retrieval and sequential dense retrieval for sequential recommendations, revealing a performance gap especially for cold-start items and proposing LIGER, a hybrid model that fuses dense retrieval with semantic-ID-based generative generation. LIGER inputs semantic IDs and item text representations, optimizing both a cosine-alignment objective and next-token prediction for semantic IDs, and augments generative candidates with cold-start items for re-ranking. Across four small-scale benchmarks, LIGER narrows the gap to dense retrieval and enhances cold-start recommendations, while highlighting the practical storage and inference efficiency benefits of the hybrid approach. The findings offer a pathway toward robust, scalable hybrid retrievers for real-world recommendation systems, with clear directions for improving cold-start generation and evaluating at larger scales.

Abstract

Sequential dense retrieval models utilize advanced sequence learning techniques to compute item and user representations, which are then used to rank relevant items for a user through inner product computation between the user and all item representations. However, this approach requires storing a unique representation for each item, resulting in significant memory requirements as the number of items grow. In contrast, the recently proposed generative retrieval paradigm offers a promising alternative by directly predicting item indices using a generative model trained on semantic IDs that encapsulate items' semantic information. Despite its potential for large-scale applications, a comprehensive comparison between generative retrieval and sequential dense retrieval under fair conditions is still lacking, leaving open questions regarding performance, and computation trade-offs. To address this, we compare these two approaches under controlled conditions on academic benchmarks and propose LIGER (LeveragIng dense retrieval for GEnerative Retrieval), a hybrid model that combines the strengths of these two widely used methods. LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences and enhancing cold-start item recommendation in the datasets evaluated. This hybrid approach provides insights into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.

Unifying Generative and Dense Retrieval for Sequential Recommendation

TL;DR

The paper addresses the trade-offs between generative retrieval and sequential dense retrieval for sequential recommendations, revealing a performance gap especially for cold-start items and proposing LIGER, a hybrid model that fuses dense retrieval with semantic-ID-based generative generation. LIGER inputs semantic IDs and item text representations, optimizing both a cosine-alignment objective and next-token prediction for semantic IDs, and augments generative candidates with cold-start items for re-ranking. Across four small-scale benchmarks, LIGER narrows the gap to dense retrieval and enhances cold-start recommendations, while highlighting the practical storage and inference efficiency benefits of the hybrid approach. The findings offer a pathway toward robust, scalable hybrid retrievers for real-world recommendation systems, with clear directions for improving cold-start generation and evaluating at larger scales.

Abstract

Sequential dense retrieval models utilize advanced sequence learning techniques to compute item and user representations, which are then used to rank relevant items for a user through inner product computation between the user and all item representations. However, this approach requires storing a unique representation for each item, resulting in significant memory requirements as the number of items grow. In contrast, the recently proposed generative retrieval paradigm offers a promising alternative by directly predicting item indices using a generative model trained on semantic IDs that encapsulate items' semantic information. Despite its potential for large-scale applications, a comprehensive comparison between generative retrieval and sequential dense retrieval under fair conditions is still lacking, leaving open questions regarding performance, and computation trade-offs. To address this, we compare these two approaches under controlled conditions on academic benchmarks and propose LIGER (LeveragIng dense retrieval for GEnerative Retrieval), a hybrid model that combines the strengths of these two widely used methods. LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences and enhancing cold-start item recommendation in the datasets evaluated. This hybrid approach provides insights into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.

Paper Structure

This paper contains 20 sections, 7 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Performance Comparison Between the Implemented Generative and Dense Retrieval Methods Across Datasets. Dense retrieval computes the inner product between predicted item representations and the entire item set, scaling with $\mathcal{O}(N)$ and requiring storage for $\mathcal{O}(N)$ embeddings. In contrast, generative retrieval stores only $\mathcal{O}(t)$ learnable embeddings and predicts the next item using beam search, scaling with $\mathcal{O}(tK)$, where $K$ is the beam size and $t$ is the number of Semantic IDs. Using identical item content information, both methods were trained on various datasets, and their performance, measured by Recall@10, is reported in the table on the right. While the implemented generative retrieval method reduces computational and storage costs, it shows lower performance compared to the implemented dense retrieval method in the datasets we evaluated.
  • Figure 2: Overview of Sequential Dense Retrieval, Generative Retrieval, and Our Hybrid Retrieval Method, LIGER. Dense Retrieval (upper left) uses an encoder model to map item IDs and text representations into dense embeddings, which are used to predict the next item in the sequence based on similarity. Generative Retrieval (lower left) employs an encoder-decoder Transformer to generate the next item's semantic ID from the given semantic ID trajectory. These semantic IDs are derived from item features such as title, brand, price, and category (upper right). Our proposed Hybrid Retrieval, LIGER (lower right) combines both semantic ID input and item text representations, integrating dense and generative retrieval techniques. By taking item positions, text representations, and semantic IDs as input, and outputs both the predicted item embedding and the next item's representation.
  • Figure 3: TIGER Fails to Generate Cold-Start Items. (a) The TIGER model generates a ranked list of candidates, with $p_K$ denoting the generation probability of generating the $K$-th ranked item over all items. The ground-truth cold-start item has a generation probability of $p^{\star}$. (b) A histogram compares $p_K$ (for $K=10$) with $p^{\star}$ when the ground-truth item is cold-start, highlighting the disparity between them. (c) The difference $p_{\text{diff}} = p_K - p^{\star}$ is plotted for $K = 10, 20, 40, 80$. A successful generation of cold-start item occurs only when $p_{\text{diff}} \leq 0$, illustrating the model’s limitations in handling cold-start items.
  • Figure 4: Inference Process
  • Figure 5: Overview of our Ablation Study. This study examines the effects of different components within LIGER (top middle), which integrates TIGER and semantic ID (SID)-based dense retrieval in an transductive setting. LIGER takes both the semantic ID and item text representation as inputs, predicting the SID and generating embeddings. We perform the following ablations to evaluate the impact of specific components: (1) To assess the effect of multi-objective optimization, we detach the gradient updates from the SID head (bottom middle). (2) To study the role of the embedding head, we remove it (top left). (3) To evaluate the contribution of the item text representation input in (2), we remove it, reducing the model to TIGER (bottom left). (4) To analyze the effect of the SID head, we remove it (top right). (5) Finally, we replace the SID with item IDs in (4), reducing the model to standard dense retrieval in transductive setting (bottom right).
  • ...and 1 more figures