Table of Contents
Fetching ...

From IDs to Semantics: A Generative Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization

Peiyu Hu, Wayne Lu, Jia Wang

TL;DR

GenCDR tackles cross-domain recommendation without relying on aligned IDs by learning transferable semantic IDs through Domain-adaptive Tokenization and a symmetric Cross-Domain Autoregressive framework. The method combines a universal semantic foundation with domain-specific adapters via a dynamic routing mechanism and employs a Domain-aware Prefix-tree to ensure efficient, valid generation. Empirical results on multiple real-world cross-domain datasets show state-of-the-art performance and strong generalization, with ablations confirming the importance of each component and efficiency analyses demonstrating scalable training and inference. The work advances cross-domain learning by integrating generative semantics, modular adapters, and constrained decoding to bridge universal and domain-specific knowledge in recommender systems.

Abstract

Cross-domain recommendation (CDR) is crucial for improving recommendation accuracy and generalization, yet traditional methods are often hindered by the reliance on shared user/item IDs, which are unavailable in most real-world scenarios. Consequently, many efforts have focused on learning disentangled representations through multi-domain joint training to bridge the domain gaps. Recent Large Language Model (LLM)-based approaches show promise, they still face critical challenges, including: (1) the \textbf{item ID tokenization dilemma}, which leads to vocabulary explosion and fails to capture high-order collaborative knowledge; and (2) \textbf{insufficient domain-specific modeling} for the complex evolution of user interests and item semantics. To address these limitations, we propose \textbf{GenCDR}, a novel \textbf{Gen}erative \textbf{C}ross-\textbf{D}omain \textbf{R}ecommendation framework. GenCDR first employs a \textbf{Domain-adaptive Tokenization} module, which generates disentangled semantic IDs for items by dynamically routing between a universal encoder and domain-specific adapters. Symmetrically, a \textbf{Cross-domain Autoregressive Recommendation} module models user preferences by fusing universal and domain-specific interests. Finally, a \textbf{Domain-aware Prefix-tree} enables efficient and accurate generation. Extensive experiments on multiple real-world datasets demonstrate that GenCDR significantly outperforms state-of-the-art baselines. Our code is available in the supplementary materials.

From IDs to Semantics: A Generative Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization

TL;DR

GenCDR tackles cross-domain recommendation without relying on aligned IDs by learning transferable semantic IDs through Domain-adaptive Tokenization and a symmetric Cross-Domain Autoregressive framework. The method combines a universal semantic foundation with domain-specific adapters via a dynamic routing mechanism and employs a Domain-aware Prefix-tree to ensure efficient, valid generation. Empirical results on multiple real-world cross-domain datasets show state-of-the-art performance and strong generalization, with ablations confirming the importance of each component and efficiency analyses demonstrating scalable training and inference. The work advances cross-domain learning by integrating generative semantics, modular adapters, and constrained decoding to bridge universal and domain-specific knowledge in recommender systems.

Abstract

Cross-domain recommendation (CDR) is crucial for improving recommendation accuracy and generalization, yet traditional methods are often hindered by the reliance on shared user/item IDs, which are unavailable in most real-world scenarios. Consequently, many efforts have focused on learning disentangled representations through multi-domain joint training to bridge the domain gaps. Recent Large Language Model (LLM)-based approaches show promise, they still face critical challenges, including: (1) the \textbf{item ID tokenization dilemma}, which leads to vocabulary explosion and fails to capture high-order collaborative knowledge; and (2) \textbf{insufficient domain-specific modeling} for the complex evolution of user interests and item semantics. To address these limitations, we propose \textbf{GenCDR}, a novel \textbf{Gen}erative \textbf{C}ross-\textbf{D}omain \textbf{R}ecommendation framework. GenCDR first employs a \textbf{Domain-adaptive Tokenization} module, which generates disentangled semantic IDs for items by dynamically routing between a universal encoder and domain-specific adapters. Symmetrically, a \textbf{Cross-domain Autoregressive Recommendation} module models user preferences by fusing universal and domain-specific interests. Finally, a \textbf{Domain-aware Prefix-tree} enables efficient and accurate generation. Extensive experiments on multiple real-world datasets demonstrate that GenCDR significantly outperforms state-of-the-art baselines. Our code is available in the supplementary materials.

Paper Structure

This paper contains 22 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: An "Apple" across Lifestyle vs. Technology domains. Blue: shared semantics (e.g., daily use); Orange: domain-specific attributes (e.g., sweet, vitamins for fresh apple; fitness, smart for Apple Watch).
  • Figure 2: The architecture of our GenCDR framework. (a) The two-stage pipeline comprising the tokenization and recommendation modules. (b) The detailed structure of the Domain-adaptive Tokenization module, featuring a hierarchical adapter system with dynamic routing. (c) The symmetric architecture of the Cross-Domain Autoregressive Recommendation module.
  • Figure 3: t-SNE visualization of item embeddings in three different settings.
  • Figure 4: Sensitivity of LoRA fine-tuning to key hyper-parameters on the Cloth dataset.
  • Figure 5: Comparison of training efficiency using the Qwen2.5-7B model. The plots show (a) trainable parameters (log scale), (b) training time, and (c) peak GPU memory for our LoRA-based GenCDR versus a Full Fine-Tuning (Full FT) version.
  • ...and 1 more figures