From IDs to Semantics: A Generative Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization
Peiyu Hu, Wayne Lu, Jia Wang
TL;DR
GenCDR tackles cross-domain recommendation without relying on aligned IDs by learning transferable semantic IDs through Domain-adaptive Tokenization and a symmetric Cross-Domain Autoregressive framework. The method combines a universal semantic foundation with domain-specific adapters via a dynamic routing mechanism and employs a Domain-aware Prefix-tree to ensure efficient, valid generation. Empirical results on multiple real-world cross-domain datasets show state-of-the-art performance and strong generalization, with ablations confirming the importance of each component and efficiency analyses demonstrating scalable training and inference. The work advances cross-domain learning by integrating generative semantics, modular adapters, and constrained decoding to bridge universal and domain-specific knowledge in recommender systems.
Abstract
Cross-domain recommendation (CDR) is crucial for improving recommendation accuracy and generalization, yet traditional methods are often hindered by the reliance on shared user/item IDs, which are unavailable in most real-world scenarios. Consequently, many efforts have focused on learning disentangled representations through multi-domain joint training to bridge the domain gaps. Recent Large Language Model (LLM)-based approaches show promise, they still face critical challenges, including: (1) the \textbf{item ID tokenization dilemma}, which leads to vocabulary explosion and fails to capture high-order collaborative knowledge; and (2) \textbf{insufficient domain-specific modeling} for the complex evolution of user interests and item semantics. To address these limitations, we propose \textbf{GenCDR}, a novel \textbf{Gen}erative \textbf{C}ross-\textbf{D}omain \textbf{R}ecommendation framework. GenCDR first employs a \textbf{Domain-adaptive Tokenization} module, which generates disentangled semantic IDs for items by dynamically routing between a universal encoder and domain-specific adapters. Symmetrically, a \textbf{Cross-domain Autoregressive Recommendation} module models user preferences by fusing universal and domain-specific interests. Finally, a \textbf{Domain-aware Prefix-tree} enables efficient and accurate generation. Extensive experiments on multiple real-world datasets demonstrate that GenCDR significantly outperforms state-of-the-art baselines. Our code is available in the supplementary materials.
