Joint Identifiability of Cross-Domain Recommendation via Hierarchical Subspace Disentanglement
Jing Du, Zesheng Ye, Bin Guo, Zhiwen Yu, Lina Yao
TL;DR
This paper tackles cross-domain recommendation by addressing the lack of joint identifiability in cross-domain user representations. It proposes HJID, a hierarchical, generative, and causal-discovery framework that splits representations into shallow, general subspaces and deep, domain-oriented subspaces, aligning the former via Maximum Mean Discrepancy and disentangling the latter through a causal data generation graph and invertible flow mappings to recover a unique joint distribution $P(oldsymbol{U}_{x}, oldsymbol{U}_{y})$. The approach formalizes joint identifiability, introduces a stable-shared versus variant-domain factor decomposition, and demonstrates superior performance over state-of-the-art methods across six CDR tasks with both strong and weak domain correlations, including non-overlapped user scenarios. By enforcing a joint identifiability constraint and leveraging causal flow-based transformation of domain-specific factors, HJID provides robust cross-domain knowledge transfer and improved interpretability of shared versus domain-specific factors. The work has practical impact for robust multi-domain recommendation systems, especially in settings with distribution shifts and limited cross-domain overlap, and lays groundwork for extending causal subspace disentanglement to multi-domain contexts.
Abstract
Cross-Domain Recommendation (CDR) seeks to enable effective knowledge transfer across domains. Existing works rely on either representation alignment or transformation bridges, but they struggle on identifying domain-shared from domain-specific latent factors. Specifically, while CDR describes user representations as a joint distribution over two domains, these methods fail to account for its joint identifiability as they primarily fixate on the marginal distribution within a particular domain. Such a failure may overlook the conditionality between two domains and how it contributes to latent factor disentanglement, leading to negative transfer when domains are weakly correlated. In this study, we explore what should and should not be transferred in cross-domain user representations from a causality perspective. We propose a Hierarchical subspace disentanglement approach to explore the Joint IDentifiability of cross-domain joint distribution, termed HJID, to preserve domain-specific behaviors from domain-shared factors. HJID organizes user representations into layers: generic shallow subspaces and domain-oriented deep subspaces. We first encode the generic pattern in the shallow subspace by minimizing the Maximum Mean Discrepancy of initial layer activation. Then, to dissect how domain-oriented latent factors are encoded in deeper layers activation, we construct a cross-domain causality-based data generation graph, which identifies cross-domain consistent and domain-specific components, adhering to the Minimal Change principle. This allows HJID to maintain stability whilst discovering unique factors for different domains, all within a generative framework of invertible transformations that guarantee the joint identifiability. With experiments on real-world datasets, we show that HJID outperforms SOTA methods on a range of strongly and weakly correlated CDR tasks.
