Table of Contents
Fetching ...

End-to-End Semantic ID Generation for Generative Advertisement Recommendation

Jie Jiang, Xinxun Zhang, Enming Zhang, Yuling Xiong, Jun Zhang, Jingwen Wang, Huan Yu, Yuxiang Wang, Hao Wang, Xiao Yan, Jiawei Jiang

TL;DR

This work addresses the limitations of two-stage SID generation in generative recommendation by introducing UniSID, an end-to-end framework that jointly learns SIDs and item embeddings directly from raw advertising data. It employs an advertisement-enhanced input schema, a unified SID-embedding generation mechanism via a shared multimodal model, multi-granularity contrastive learning to align semantics at different SID levels, and a summary-based ad reconstruction to inject high-level latent semantics. Empirically, UniSID yields consistent improvements in SID quality and downstream tasks (next-ad prediction, ad retrieval, and next-item prediction) across industrial datasets and a public benchmark, with ablations confirming the value of joint optimization, MG Loss, and reconstruction. The approach demonstrates strong generalization and offers practical benefits for scalable, semantically faithful SID generation in advertising and broader GR settings.

Abstract

Generative Recommendation (GR) has excelled by framing recommendation as next-token prediction. This paradigm relies on Semantic IDs (SIDs) to tokenize large-scale items into discrete sequences. Existing GR approaches predominantly generate SIDs via Residual Quantization (RQ), where items are encoded into embeddings and then quantized to discrete SIDs. However, this paradigm suffers from inherent limitations: 1) Objective misalignment and semantic degradation stemming from the two-stage compression; 2) Error accumulation inherent in the structure of RQ. To address these limitations, we propose UniSID, a Unified SID generation framework for generative advertisement recommendation. Specifically, we jointly optimize embeddings and SIDs in an end-to-end manner from raw advertising data, enabling semantic information to flow directly into the SID space and thus addressing the inherent limitations of the two-stage cascading compression paradigm. To capture fine-grained semantics, a multi-granularity contrastive learning strategy is introduced to align distinct items across SID levels. Finally, a summary-based ad reconstruction mechanism is proposed to encourage SIDs to capture high-level semantic information that is not explicitly present in advertising contexts. Experiments demonstrate that UniSID consistently outperforms state-of-the-art SID generation methods, yielding up to a 4.62% improvement in Hit Rate metrics across downstream advertising scenarios compared to the strongest baseline.

End-to-End Semantic ID Generation for Generative Advertisement Recommendation

TL;DR

This work addresses the limitations of two-stage SID generation in generative recommendation by introducing UniSID, an end-to-end framework that jointly learns SIDs and item embeddings directly from raw advertising data. It employs an advertisement-enhanced input schema, a unified SID-embedding generation mechanism via a shared multimodal model, multi-granularity contrastive learning to align semantics at different SID levels, and a summary-based ad reconstruction to inject high-level latent semantics. Empirically, UniSID yields consistent improvements in SID quality and downstream tasks (next-ad prediction, ad retrieval, and next-item prediction) across industrial datasets and a public benchmark, with ablations confirming the value of joint optimization, MG Loss, and reconstruction. The approach demonstrates strong generalization and offers practical benefits for scalable, semantically faithful SID generation in advertising and broader GR settings.

Abstract

Generative Recommendation (GR) has excelled by framing recommendation as next-token prediction. This paradigm relies on Semantic IDs (SIDs) to tokenize large-scale items into discrete sequences. Existing GR approaches predominantly generate SIDs via Residual Quantization (RQ), where items are encoded into embeddings and then quantized to discrete SIDs. However, this paradigm suffers from inherent limitations: 1) Objective misalignment and semantic degradation stemming from the two-stage compression; 2) Error accumulation inherent in the structure of RQ. To address these limitations, we propose UniSID, a Unified SID generation framework for generative advertisement recommendation. Specifically, we jointly optimize embeddings and SIDs in an end-to-end manner from raw advertising data, enabling semantic information to flow directly into the SID space and thus addressing the inherent limitations of the two-stage cascading compression paradigm. To capture fine-grained semantics, a multi-granularity contrastive learning strategy is introduced to align distinct items across SID levels. Finally, a summary-based ad reconstruction mechanism is proposed to encourage SIDs to capture high-level semantic information that is not explicitly present in advertising contexts. Experiments demonstrate that UniSID consistently outperforms state-of-the-art SID generation methods, yielding up to a 4.62% improvement in Hit Rate metrics across downstream advertising scenarios compared to the strongest baseline.
Paper Structure (22 sections, 11 equations, 6 figures, 7 tables)

This paper contains 22 sections, 11 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Two-stage cascaded compression of current methods and unified generation of SID of our method.
  • Figure 2: The framework of UniSID
  • Figure 3: Comparison between joint training and task-specific separate training on the Ad-60W data
  • Figure 4: The impact of hyperparameter $\lambda$ on SID quality in the Ad-60W dataset.
  • Figure 5: Prompt example of the ad attributes summary.
  • ...and 1 more figures