Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

Anima Singh; Trung Vu; Nikhil Mehta; Raghunandan Keshavan; Maheswaran Sathiamoorthy; Yilin Zheng; Lichan Hong; Lukasz Heldt; Li Wei; Devansh Tandon; Ed H. Chi; Xinyang Yi

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

Anima Singh, Trung Vu, Nikhil Mehta, Raghunandan Keshavan, Maheswaran Sathiamoorthy, Yilin Zheng, Lichan Hong, Lukasz Heldt, Li Wei, Devansh Tandon, Ed H. Chi, Xinyang Yi

TL;DR

Random ID hashing in massive embedding tables hampers generalization to unseen items. The paper proposes Semantic IDs derived from frozen content embeddings via a Residual Quantization VAE (RQ-VAE) to compress content signals into discrete tokens, balancing memorization and generalization. It introduces two SID adaptation methods—N-gram and SentencePiece (SPM)—with SPM showing superior performance at scale on a YouTube ranking model, using $L=8$ and $K=2048$. Experiments demonstrate improved generalization to new and long-tail items without sacrificing overall CTR AUC, supporting production viability and stability of semantic token-based representations.

Abstract

Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs -- a compact discrete item representation learned from frozen content embeddings using RQ-VAE that captures the hierarchy of concepts in items -- as a replacement for random item ids. Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models. We propose novel methods for adapting Semantic IDs in industry-scale ranking models, through hashing sub-pieces of of the Semantic-ID sequences. In particular, we find that the SentencePiece model that is commonly used in LLM tokenization outperforms manually crafted pieces such as N-grams. To the end, we evaluate our approaches in a real-world ranking model for YouTube recommendations. Our experiments demonstrate that Semantic IDs can replace the direct use of video IDs by improving the generalization ability on new and long-tail item slices without sacrificing overall model quality.

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

TL;DR

and

. Experiments demonstrate improved generalization to new and long-tail items without sacrificing overall CTR AUC, supporting production viability and stability of semantic token-based representations.

Abstract

Paper Structure (18 sections, 7 figures, 1 table)

This paper contains 18 sections, 7 figures, 1 table.

Introduction
Related Work
Embedding learning
Cold-start and content information
Discrete representations
Proposed Approaches
Overview
RQ-VAE for Semantic IDs (SIDs)
Semantic ID Representation in Ranking
Experiments
Experimental Setup
Performance of Semantic ID
Conclusion and Future Work
Appendix
RQ-VAE Training and Serving Setup
...and 3 more sections

Figures (7)

Figure 1: Illustration of RQ-VAE: The input vector ${\bm{x}}$ is encoded into a latent ${\bm{z}}$, which is then recursively quantized by looking up the nearest codebook vector of the residual at each level. In this figure, the item represented by ${\bm{x}}$ has $(1,4,6,2)$ as its Semantic ID.
Figure 2: Percentage improvement in CTR AUC metric when user history is not used as a input feature. Improvement is relative to Random Hashing baseline with 8K embedding table size.
Figure 3: Percentage improvement in CTR AUC metric when user history is used as a input feature. Improvement is relative to Random Hashing baseline with 8K embedding table size.
Figure 4: Number of Subword Embeddings per video.
Figure 5: Comparison of ranking performance of SID-3Bigram-sum representation derived using RQ-VAE$_{v0}$ and RQ-VAE$_{v1}$.
...and 2 more figures

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

TL;DR

Abstract

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

Authors

TL;DR

Abstract

Table of Contents

Figures (7)