Table of Contents
Fetching ...

UNGER: Generative Recommendation with A Unified Code via Semantic and Collaborative Integration

Longtao Xiao, Haozhao Wang, Cheng Wang, Linfei Ji, Yifan Wang, Jieming Zhu, Zhenhua Dong, Rui Zhang, Ruixuan Li

TL;DR

The paper introduces UNGER, a generative recommender that fuses semantic and collaborative knowledge into a single unified code (Unicodes) to enable efficient autoregressive item generation. It tackles the semantic-dominance issue with a modality-adaptation layer, cross-modality alignment, and intra-modality distillation across two stages: Stage I learns integrated embeddings and discretizes them into unicodes; Stage II decodes user histories into unicode sequences with a distillation signal to recover information lost in quantization. Empirical results on three benchmarks show UNGER achieving state-of-the-art performance while reducing storage and improving inference speed compared with dual-code methods, and analyses reveal favorable scaling properties and robust hyper-parameter behavior. The approach offers a practical, extensible framework for unified multimodal representations in generative recommendation, with interpretable discrete codes that capture cross-modal concepts and user intent.

Abstract

With the rise of generative paradigms, generative recommendation has garnered increasing attention. The core component is the item code, generally derived by quantizing collaborative or semantic representations to serve as candidate items identifiers in the context. However, existing methods typically construct separate codes for each modality, leading to higher computational and storage costs and hindering the integration of their complementary strengths. Considering this limitation, we seek to integrate two different modalities into a unified code, fully unleashing the potential of complementary nature among modalities. Nevertheless, the integration remains challenging: the integrated embedding obtained by the common concatenation method would lead to underutilization of collaborative knowledge, thereby resulting in limited effectiveness. To address this, we propose a novel method, named UNGER, which integrates semantic and collaborative knowledge into a unified code for generative recommendation. Specifically, we propose to adaptively learn an integrated embedding through the joint optimization of cross-modality knowledge alignment and next item prediction tasks. Subsequently, to mitigate the information loss caused by the quantization process, we introduce an intra-modality knowledge distillation task, using the integrated embeddings as supervised signals to compensate. Extensive experiments on three widely used benchmarks demonstrate the superiority of our approach compared to existing methods.

UNGER: Generative Recommendation with A Unified Code via Semantic and Collaborative Integration

TL;DR

The paper introduces UNGER, a generative recommender that fuses semantic and collaborative knowledge into a single unified code (Unicodes) to enable efficient autoregressive item generation. It tackles the semantic-dominance issue with a modality-adaptation layer, cross-modality alignment, and intra-modality distillation across two stages: Stage I learns integrated embeddings and discretizes them into unicodes; Stage II decodes user histories into unicode sequences with a distillation signal to recover information lost in quantization. Empirical results on three benchmarks show UNGER achieving state-of-the-art performance while reducing storage and improving inference speed compared with dual-code methods, and analyses reveal favorable scaling properties and robust hyper-parameter behavior. The approach offers a practical, extensible framework for unified multimodal representations in generative recommendation, with interpretable discrete codes that capture cross-modal concepts and user intent.

Abstract

With the rise of generative paradigms, generative recommendation has garnered increasing attention. The core component is the item code, generally derived by quantizing collaborative or semantic representations to serve as candidate items identifiers in the context. However, existing methods typically construct separate codes for each modality, leading to higher computational and storage costs and hindering the integration of their complementary strengths. Considering this limitation, we seek to integrate two different modalities into a unified code, fully unleashing the potential of complementary nature among modalities. Nevertheless, the integration remains challenging: the integrated embedding obtained by the common concatenation method would lead to underutilization of collaborative knowledge, thereby resulting in limited effectiveness. To address this, we propose a novel method, named UNGER, which integrates semantic and collaborative knowledge into a unified code for generative recommendation. Specifically, we propose to adaptively learn an integrated embedding through the joint optimization of cross-modality knowledge alignment and next item prediction tasks. Subsequently, to mitigate the information loss caused by the quantization process, we introduce an intra-modality knowledge distillation task, using the integrated embeddings as supervised signals to compensate. Extensive experiments on three widely used benchmarks demonstrate the superiority of our approach compared to existing methods.

Paper Structure

This paper contains 50 sections, 14 equations, 15 figures, 8 tables, 1 algorithm.

Figures (15)

  • Figure 1: Traditional vs. Generative Recommendation.
  • Figure 2: An example of item semantic knowledge.
  • Figure 3: Comparison of inference speed (second per sample, topk=5, beam size=100) between a unified code and two separate codes setups on the Beauty dataset.
  • Figure 4: Proportional similarity of semantic modality and collaborative modality to the final representation with concatenation method on the Beauty dataset.
  • Figure 5: An overview of UNGER. UNGER consists of two stages. The first stage integrates semantic and collaborative knowledge to construct the unified code, Unicode for each item. The second stage utilizes the obtained unicodes to perform generative recommendation. To achieve the goal of utilizing a unified code to encode the two different knowledge for generative recommendation, we introduce two auxiliary tasks at each stage: a cross-modality knowledge alignment task (CKA) in the first stage and an intra-modality knowledge distillation task (IKD) in the second stage. Besides, in the first stage, a modality adaption layer with AdaLN is also introduced to bridge the modality gap between semantic and collaborative space.
  • ...and 10 more figures