Partial Identifiability for Domain Adaptation
Lingjing Kong, Shaoan Xie, Weiran Yao, Yujia Zheng, Guangyi Chen, Petar Stojanov, Victor Akinwande, Kun Zhang
TL;DR
This work tackles unsupervised domain adaptation by addressing the identifiability challenge of cross-domain joint distributions. It proposes a latent-variable data-generating process that partitions latent space into invariant $\mathbf{z}_{c}$ and changing $\mathbf{z}_{s}$ components, with a high-level invariant $\tilde{\mathbf{z}}_{s}$ mapping via a monotonic function to model domain shifts under a minimal-change constraint. The authors prove partial identifiability of the changing components, the invariant subspace, and the joint distribution in the shared space, then implement iMSDA, a VAE-based framework with a flow model that recovers latent variables and enables robust prediction in the target domain. Empirical results on synthetic data and real-world benchmarks (PACS and Office-Home) show state-of-the-art performance and validate the theoretical identifiability claims, with ablations clarifying the roles of loss terms and the changing-part dimension. Overall, the paper provides a principled approach for principled domain alignment and target-domain prediction in multi-source UDA, with practical effectiveness demonstrated across diverse datasets.
Abstract
Unsupervised domain adaptation is critical to many real-world applications where label information is unavailable in the target domain. In general, without further assumptions, the joint distribution of the features and the label is not identifiable in the target domain. To address this issue, we rely on the property of minimal changes of causal mechanisms across domains to minimize unnecessary influences of distribution shifts. To encode this property, we first formulate the data-generating process using a latent variable model with two partitioned latent subspaces: invariant components whose distributions stay the same across domains and sparse changing components that vary across domains. We further constrain the domain shift to have a restrictive influence on the changing components. Under mild conditions, we show that the latent variables are partially identifiable, from which it follows that the joint distribution of data and labels in the target domain is also identifiable. Given the theoretical insights, we propose a practical domain adaptation framework called iMSDA. Extensive experimental results reveal that iMSDA outperforms state-of-the-art domain adaptation algorithms on benchmark datasets, demonstrating the effectiveness of our framework.
