Table of Contents
Fetching ...

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

Yuhao Yang, Zhi Ji, Zhaopeng Li, Yi Li, Zhonglin Mo, Yue Ding, Kai Chen, Zijian Zhang, Jie Li, Shuanglong Li, Lin Liu

TL;DR

COBRA tackles the mismatch between generative and dense retrieval in recommender systems by cascading sparse semantic IDs with learnable dense vectors. It alternates between generating sparse IDs and refined dense representations within a Transformer-based architecture and trains end-to-end with a dual objective, enabling dynamic representation refinement. A coarse-to-fine generation process, augmented by BeamFusion, yields high-precision and diverse recommendations, demonstrated through extensive public benchmarks, industrial-scale offline evaluations, and online A/B tests on a platform with hundreds of millions of users. The reported gains in recall, NDCG, and online metrics establish COBRA as a scalable, practical approach for unified generative and dense retrieval in large-scale recommendation systems.

Abstract

Generative models have recently gained attention in recommendation systems by directly predicting item identifiers from user interaction sequences. However, existing methods suffer from significant information loss due to the separation of stages such as quantization and sequence modeling, hindering their ability to achieve the modeling precision and accuracy of sequential dense retrieval techniques. Integrating generative and dense retrieval methods remains a critical challenge. To address this, we introduce the Cascaded Organized Bi-Represented generAtive retrieval (COBRA) framework, which innovatively integrates sparse semantic IDs and dense vectors through a cascading process. Our method alternates between generating these representations by first generating sparse IDs, which serve as conditions to aid in the generation of dense vectors. End-to-end training enables dynamic refinement of dense representations, capturing both semantic insights and collaborative signals from user-item interactions. During inference, COBRA employs a coarse-to-fine strategy, starting with sparse ID generation and refining them into dense vectors via the generative model. We further propose BeamFusion, an innovative approach combining beam search with nearest neighbor scores to enhance inference flexibility and recommendation diversity. Extensive experiments on public datasets and offline tests validate our method's robustness. Online A/B tests on a real-world advertising platform with over 200 million daily users demonstrate substantial improvements in key metrics, highlighting COBRA's practical advantages.

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

TL;DR

COBRA tackles the mismatch between generative and dense retrieval in recommender systems by cascading sparse semantic IDs with learnable dense vectors. It alternates between generating sparse IDs and refined dense representations within a Transformer-based architecture and trains end-to-end with a dual objective, enabling dynamic representation refinement. A coarse-to-fine generation process, augmented by BeamFusion, yields high-precision and diverse recommendations, demonstrated through extensive public benchmarks, industrial-scale offline evaluations, and online A/B tests on a platform with hundreds of millions of users. The reported gains in recall, NDCG, and online metrics establish COBRA as a scalable, practical approach for unified generative and dense retrieval in large-scale recommendation systems.

Abstract

Generative models have recently gained attention in recommendation systems by directly predicting item identifiers from user interaction sequences. However, existing methods suffer from significant information loss due to the separation of stages such as quantization and sequence modeling, hindering their ability to achieve the modeling precision and accuracy of sequential dense retrieval techniques. Integrating generative and dense retrieval methods remains a critical challenge. To address this, we introduce the Cascaded Organized Bi-Represented generAtive retrieval (COBRA) framework, which innovatively integrates sparse semantic IDs and dense vectors through a cascading process. Our method alternates between generating these representations by first generating sparse IDs, which serve as conditions to aid in the generation of dense vectors. End-to-end training enables dynamic refinement of dense representations, capturing both semantic insights and collaborative signals from user-item interactions. During inference, COBRA employs a coarse-to-fine strategy, starting with sparse ID generation and refining them into dense vectors via the generative model. We further propose BeamFusion, an innovative approach combining beam search with nearest neighbor scores to enhance inference flexibility and recommendation diversity. Extensive experiments on public datasets and offline tests validate our method's robustness. Online A/B tests on a real-world advertising platform with over 200 million daily users demonstrate substantial improvements in key metrics, highlighting COBRA's practical advantages.

Paper Structure

This paper contains 28 sections, 14 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Comparison of generative recommendation paradigms. The left section illustrates traditional generative retrieval approaches, exemplified by TIGER, which utilize a sequence of sparse IDs as input within a Transformer encoder-decoder architecture to directly predict the sparse ID of the next item. The right section depicts the proposed COBRA framework, which employs Cascaded Organized Bi-Represented generAtive retrieval. This approach integrates sparse IDs to capture coarse-grained semantic information and dense vectors to encapsulate fine-grained detail. The cascaded representation is processed by a Transformer decoder that sequentially predicts the sparse ID followed by the dense vector.
  • Figure 2: The architecture of COBRA. The model employs a cascaded sparse-dense representation approach, where sparse IDs are generated via Residual Quantization and dense vectors are produced by a trainable Transformer Encoder. These representations serve as inputs to a Transformer Decoder, which alternates between predicting sparse IDs and dense vectors. The predicted outputs are used to compute the loss functions $\mathcal{L}_{\text{sparse}}$ and $\mathcal{L}_{\text{dense}}$. For the sake of simplicity, the figure illustrates an example with a single level of sparse ID.
  • Figure 3: Illustration of the Coarse-to-Fine Generation process. During inference, $M$ sparse IDs are generated via Beam Search, and appended to the sequence. Dense vectors are then generated and used in ANN to obtain candidate items. BeamFusion combines beam scores and similarity scores to rank candidates, from which the top $K$ items are selected.
  • Figure 4: Cosine similarity matrices for advertisement dense embeddings. (a) COBRA's dense embeddings exhibit strong intra-ID cohesion and inter-ID separation. (b) COBRA w/o ID shows weaker category separation. (c) The difference matrix quantifies the enhancement in cohesion and separation when sparse IDs are incorporated.
  • Figure 5: Embedding Visualization using t-SNE. The plot illustrates the distribution of 10,000 randomly sampled advertisement embeddings in a two-dimensional space for COBRA. Distinct clustering centers are observed for various IDs.
  • ...and 1 more figures