Table of Contents
Fetching ...

CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation

Jieming Zhu, Mengqun Jin, Qijiong Liu, Zexuan Qiu, Zhenhua Dong, Xiu Li

TL;DR

CoST introduces contrastive quantization for semantic tokenization to address limitations of reconstruction-based tokenization in generative recommender systems. By replacing exact reconstruction with a batch-level contrastive objective, CoST preserves item neighborhood relationships in the token space, enabling more effective autoregressive item generation. Empirical results on MIND and Amazon Office show substantial gains over RQ-VAE baselines, with Recall@5 and NDCG@5 improving by up to roughly 43-44% on MIND. This work highlights the critical role of semantic tokenization quality in generative retrieval and suggests directions for further integrating neighborhood signals and multimodal information into tokenization.

Abstract

Embedding-based retrieval serves as a dominant approach to candidate item matching for industrial recommender systems. With the success of generative AI, generative retrieval has recently emerged as a new retrieval paradigm for recommendation, which casts item retrieval as a generation problem. Its model consists of two stages: semantic tokenization and autoregressive generation. The first stage involves item tokenization that constructs discrete semantic tokens to index items, while the second stage autoregressively generates semantic tokens of candidate items. Therefore, semantic tokenization serves as a crucial preliminary step for training generative recommendation models. Existing research usually employs a vector quantizier with reconstruction loss (e.g., RQ-VAE) to obtain semantic tokens of items, but this method fails to capture the essential neighborhood relationships that are vital for effective item modeling in recommender systems. In this paper, we propose a contrastive quantization-based semantic tokenization approach, named CoST, which harnesses both item relationships and semantic information to learn semantic tokens. Our experimental results highlight the significant impact of semantic tokenization on generative recommendation performance, with CoST achieving up to a 43% improvement in Recall@5 and 44% improvement in NDCG@5 on the MIND dataset over previous baselines.

CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation

TL;DR

CoST introduces contrastive quantization for semantic tokenization to address limitations of reconstruction-based tokenization in generative recommender systems. By replacing exact reconstruction with a batch-level contrastive objective, CoST preserves item neighborhood relationships in the token space, enabling more effective autoregressive item generation. Empirical results on MIND and Amazon Office show substantial gains over RQ-VAE baselines, with Recall@5 and NDCG@5 improving by up to roughly 43-44% on MIND. This work highlights the critical role of semantic tokenization quality in generative retrieval and suggests directions for further integrating neighborhood signals and multimodal information into tokenization.

Abstract

Embedding-based retrieval serves as a dominant approach to candidate item matching for industrial recommender systems. With the success of generative AI, generative retrieval has recently emerged as a new retrieval paradigm for recommendation, which casts item retrieval as a generation problem. Its model consists of two stages: semantic tokenization and autoregressive generation. The first stage involves item tokenization that constructs discrete semantic tokens to index items, while the second stage autoregressively generates semantic tokens of candidate items. Therefore, semantic tokenization serves as a crucial preliminary step for training generative recommendation models. Existing research usually employs a vector quantizier with reconstruction loss (e.g., RQ-VAE) to obtain semantic tokens of items, but this method fails to capture the essential neighborhood relationships that are vital for effective item modeling in recommender systems. In this paper, we propose a contrastive quantization-based semantic tokenization approach, named CoST, which harnesses both item relationships and semantic information to learn semantic tokens. Our experimental results highlight the significant impact of semantic tokenization on generative recommendation performance, with CoST achieving up to a 43% improvement in Recall@5 and 44% improvement in NDCG@5 on the MIND dataset over previous baselines.
Paper Structure (11 sections, 4 equations, 4 figures, 3 tables)

This paper contains 11 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: A framework of generative recommendation including tokenization phase and generation phase.
  • Figure 2: The vector quantization workflow trained via reconstructive quantization and contrastive quantization.
  • Figure 3: Analysis on temperature $\tau$ and training epochs $e$ on the MIND dataset.
  • Figure 4: Sensitivity analysis on codebook size $K$ (fixed $M=3$ and $d=96$), number of codebooks $M$ (fixed $K=64$ and $d=96$), embedding dimension $d$ (fixed $K=64$ and $M=3$) on the MIND dataset.