Table of Contents
Fetching ...

DAS: Dual-Aligned Semantic IDs Empowered Industrial Recommender System

Wencai Ye, Mingjie Sun, Shaoyun Shi, Peng Wang, Wenjin Wu, Peng Jiang

TL;DR

DAS tackles the misalignment between No-Aligned Semantic IDs derived from multi-modal embeddings and downstream collaborative signals in industrial recommender systems. It introduces a one-stage co-training framework that jointly optimizes SID quantization and alignment with CF representations, leveraging UISM for semantic extraction, ICDM for debiasing, and MDAM for multi-view alignment, including a mutual-information objective $I(SID;CF)$. The approach yields dual-aligned SIDs for users and items, powered by RQ-VAE-based quantization and six MDAM alignments, plus debiasing to mitigate popularity/conformity biases. Empirical results on billion-scale real data show consistent offline CTR improvements and sizable online eCPM gains, with pronounced benefits in cold-start settings, demonstrating practical impact for integrating multi-modal content with behavioral signals in large-scale industrial RS. DAS has been deployed in Kuaishou’s ad ecosystem, servicing hundreds of millions of users daily and enabling both discriminative and generative recommendation tasks at scale.

Abstract

Semantic IDs are discrete identifiers generated by quantizing the Multi-modal Large Language Models (MLLMs) embeddings, enabling efficient multi-modal content integration in recommendation systems. However, their lack of collaborative signals results in a misalignment with downstream discriminative and generative recommendation objectives. Recent studies have introduced various alignment mechanisms to address this problem, but their two-stage framework design still leads to two main limitations: (1) inevitable information loss during alignment, and (2) inflexibility in applying adaptive alignment strategies, consequently constraining the mutual information maximization during the alignment process. To address these limitations, we propose a novel and flexible one-stage Dual-Aligned Semantic IDs (DAS) method that simultaneously optimizes quantization and alignment, preserving semantic integrity and alignment quality while avoiding the information loss typically associated with two-stage methods. Meanwhile, DAS achieves more efficient alignment between the semantic IDs and collaborative signals, with the following two innovative and effective approaches: (1) Multi-view Constrative Alignment: To maximize mutual information between semantic IDs and collaborative signals, we first incorporate an ID-based CF debias module, and then design three effective contrastive alignment methods: dual user-to-item (u2i), dual item-to-item/user-to-user (i2i/u2u), and dual co-occurrence item-to-item/user-to-user (i2i/u2u). (2) Dual Learning: By aligning the dual quantizations of users and ads, the constructed semantic IDs for users and ads achieve stronger alignment. Finally, we conduct extensive offline experiments and online A/B tests to evaluate DAS's effectiveness, which is now successfully deployed across various advertising scenarios at Kuaishou App, serving over 400 million users daily.

DAS: Dual-Aligned Semantic IDs Empowered Industrial Recommender System

TL;DR

DAS tackles the misalignment between No-Aligned Semantic IDs derived from multi-modal embeddings and downstream collaborative signals in industrial recommender systems. It introduces a one-stage co-training framework that jointly optimizes SID quantization and alignment with CF representations, leveraging UISM for semantic extraction, ICDM for debiasing, and MDAM for multi-view alignment, including a mutual-information objective . The approach yields dual-aligned SIDs for users and items, powered by RQ-VAE-based quantization and six MDAM alignments, plus debiasing to mitigate popularity/conformity biases. Empirical results on billion-scale real data show consistent offline CTR improvements and sizable online eCPM gains, with pronounced benefits in cold-start settings, demonstrating practical impact for integrating multi-modal content with behavioral signals in large-scale industrial RS. DAS has been deployed in Kuaishou’s ad ecosystem, servicing hundreds of millions of users daily and enabling both discriminative and generative recommendation tasks at scale.

Abstract

Semantic IDs are discrete identifiers generated by quantizing the Multi-modal Large Language Models (MLLMs) embeddings, enabling efficient multi-modal content integration in recommendation systems. However, their lack of collaborative signals results in a misalignment with downstream discriminative and generative recommendation objectives. Recent studies have introduced various alignment mechanisms to address this problem, but their two-stage framework design still leads to two main limitations: (1) inevitable information loss during alignment, and (2) inflexibility in applying adaptive alignment strategies, consequently constraining the mutual information maximization during the alignment process. To address these limitations, we propose a novel and flexible one-stage Dual-Aligned Semantic IDs (DAS) method that simultaneously optimizes quantization and alignment, preserving semantic integrity and alignment quality while avoiding the information loss typically associated with two-stage methods. Meanwhile, DAS achieves more efficient alignment between the semantic IDs and collaborative signals, with the following two innovative and effective approaches: (1) Multi-view Constrative Alignment: To maximize mutual information between semantic IDs and collaborative signals, we first incorporate an ID-based CF debias module, and then design three effective contrastive alignment methods: dual user-to-item (u2i), dual item-to-item/user-to-user (i2i/u2u), and dual co-occurrence item-to-item/user-to-user (i2i/u2u). (2) Dual Learning: By aligning the dual quantizations of users and ads, the constructed semantic IDs for users and ads achieve stronger alignment. Finally, we conduct extensive offline experiments and online A/B tests to evaluate DAS's effectiveness, which is now successfully deployed across various advertising scenarios at Kuaishou App, serving over 400 million users daily.

Paper Structure

This paper contains 26 sections, 10 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The MLLMs Semantic IDs application pipeline.
  • Figure 2: Comparison of Semantic IDs construction. (1) No-Aligned, (2) Two-Stage Aligned and (3) Ours: One-Stage Dual-Aligned.
  • Figure 3: The implementation of DAS. UISM module leverages the RQ-VAEs in quantization process, ICDM module uses a disentangled debiasing network to obtain unbiased CF representations, and during the co-training process of UISM and ICDM, alignment between the CF and Semantic IDs is achieved through MDAM module.
  • Figure 4: Semantic and CF models alignment.
  • Figure 5: ID-based CF debias casual graph.
  • ...and 2 more figures