Table of Contents
Fetching ...

Unified Semantic and ID Representation Learning for Deep Recommenders

Guanyu Lin, Zhigang Hua, Tao Feng, Shuang Yang, Bo Long, Jiaxuan You

TL;DR

The paper tackles the redundancy and cold-start limitations of ID-token-based recommendations by proposing a Unified Semantic and ID Representation Learning framework. It blends low-dimensional ID tokens with semantically informed tokens obtained via RQ-VAE, and introduces a hybrid cosine–Euclidean distance scheme that applies to different layers to both decouple dense embeddings and distinguish unique items. End-to-end optimization combines recommendation loss with RQ-VAE quantization and text reconstruction losses, yielding 6–17% improvements on three benchmark datasets while reducing token size by over 80%. The work demonstrates that semantic and ID tokens are complementary, enabling better generalization and efficiency for large-scale sequential recommendation systems.

Abstract

Effective recommendation is crucial for large-scale online platforms. Traditional recommendation systems primarily rely on ID tokens to uniquely identify items, which can effectively capture specific item relationships but suffer from issues such as redundancy and poor performance in cold-start scenarios. Recent approaches have explored using semantic tokens as an alternative, yet they face challenges, including item duplication and inconsistent performance gains, leaving the potential advantages of semantic tokens inadequately examined. To address these limitations, we propose a Unified Semantic and ID Representation Learning framework that leverages the complementary strengths of both token types. In our framework, ID tokens capture unique item attributes, while semantic tokens represent shared, transferable characteristics. Additionally, we analyze the role of cosine similarity and Euclidean distance in embedding search, revealing that cosine similarity is more effective in decoupling accumulated embeddings, while Euclidean distance excels in distinguishing unique items. Our framework integrates cosine similarity in earlier layers and Euclidean distance in the final layer to optimize representation learning. Experiments on three benchmark datasets show that our method significantly outperforms state-of-the-art baselines, with improvements ranging from 6\% to 17\% and a reduction in token size by over 80%. These results demonstrate the effectiveness of combining ID and semantic tokenization to enhance the generalization ability of recommender systems.

Unified Semantic and ID Representation Learning for Deep Recommenders

TL;DR

The paper tackles the redundancy and cold-start limitations of ID-token-based recommendations by proposing a Unified Semantic and ID Representation Learning framework. It blends low-dimensional ID tokens with semantically informed tokens obtained via RQ-VAE, and introduces a hybrid cosine–Euclidean distance scheme that applies to different layers to both decouple dense embeddings and distinguish unique items. End-to-end optimization combines recommendation loss with RQ-VAE quantization and text reconstruction losses, yielding 6–17% improvements on three benchmark datasets while reducing token size by over 80%. The work demonstrates that semantic and ID tokens are complementary, enabling better generalization and efficiency for large-scale sequential recommendation systems.

Abstract

Effective recommendation is crucial for large-scale online platforms. Traditional recommendation systems primarily rely on ID tokens to uniquely identify items, which can effectively capture specific item relationships but suffer from issues such as redundancy and poor performance in cold-start scenarios. Recent approaches have explored using semantic tokens as an alternative, yet they face challenges, including item duplication and inconsistent performance gains, leaving the potential advantages of semantic tokens inadequately examined. To address these limitations, we propose a Unified Semantic and ID Representation Learning framework that leverages the complementary strengths of both token types. In our framework, ID tokens capture unique item attributes, while semantic tokens represent shared, transferable characteristics. Additionally, we analyze the role of cosine similarity and Euclidean distance in embedding search, revealing that cosine similarity is more effective in decoupling accumulated embeddings, while Euclidean distance excels in distinguishing unique items. Our framework integrates cosine similarity in earlier layers and Euclidean distance in the final layer to optimize representation learning. Experiments on three benchmark datasets show that our method significantly outperforms state-of-the-art baselines, with improvements ranging from 6\% to 17\% and a reduction in token size by over 80%. These results demonstrate the effectiveness of combining ID and semantic tokenization to enhance the generalization ability of recommender systems.

Paper Structure

This paper contains 34 sections, 2 equations, 16 figures, 7 tables, 1 algorithm.

Figures (16)

  • Figure 1: Visualization of ID tokens on Amazon Beauty dataset. Here some ID tokens with the same color share a close embedding space, which means they can be compressed and represented with shared semantic tokens.
  • Figure 2: Framework of the unified semantic and ID representation learning. Firstly, the model integrates both semantic tokens, learned through RQ-VAE, and ID tokens for the recommendation task. Secondly, cosine similarity is applied in the first two layers to decouple accumulated embeddings, while Euclidean distance is utilized in the final layer to effectively distinguish unique items. Finally, the overall model is optimized in an end-to-end manner, combining the recommendation loss, RQ-VAE quantization loss, and text reconstruction loss.
  • Figure 3: Illustration of unified semantic and ID tokenization. Specifically, we replace ID tokens with low-dimension ID tokens and semantic tokens.
  • Figure 4: Visualization of the codebook selection using cosine similarity across three layers. This figure shows the count of items from various categories assigned to specific token indices, with a focus on the top-3 codebook indices that contain the highest number of items. The distinct distribution of items across different indices suggests that cosine similarity effectively captures category-specific information and helps in distinguishing between categories.
  • Figure 5: Visualization of the codebook selection using Euclidean distance across three layers. The uniform distribution of items across categories in the first layer indicates that Euclidean distance struggles to effectively capture category-specific information at this stage, making it less capable of distinguishing between categories compared to later layers.
  • ...and 11 more figures