Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective
Yuan Yao, Xiaopu Zhang, Yu Zhang, Jian Jin, Qiang Yang
TL;DR
This work tackles SHDA, where source and target domains have different feature representations and no sample-wise correspondences, with labeled sources and largely unlabeled targets. Through extensive 330-task experiments, the authors show that source category information and source feature content have limited influence on target performance, and surprisingly that noise drawn from simple distributions can carry transferable knowledge. They introduce the Knowledge Transfer Framework (KTF) to quantify transferable knowledge via a common subspace, and demonstrate that the transferability and discriminability of the source domain are the key drivers of improvement in the target domain, supported by visualization analyses. The findings suggest practical SHDA strategies that leverage source noise or ensure source-domain discriminability/transferability, with implications for privacy-aware and source-free adaptation, and point to future work on theoretical foundations.
Abstract
Semi-supervised heterogeneous domain adaptation (SHDA) addresses learning across domains with distinct feature representations and distributions, where source samples are labeled while most target samples are unlabeled, with only a small fraction labeled. Moreover, there is no one-to-one correspondence between source and target samples. Although various SHDA methods have been developed to tackle this problem, the nature of the knowledge transferred across heterogeneous domains remains unclear. This paper delves into this question from an empirical perspective. We conduct extensive experiments on about 330 SHDA tasks, employing two supervised learning methods and seven representative SHDA methods. Surprisingly, our observations indicate that both the category and feature information of source samples do not significantly impact the performance of the target domain. Additionally, noise drawn from simple distributions, when used as source samples, may contain transferable knowledge. Based on this insight, we perform a series of experiments to uncover the underlying principles of transferable knowledge in SHDA. Specifically, we design a unified Knowledge Transfer Framework (KTF) for SHDA. Based on the KTF, we find that the transferable knowledge in SHDA primarily stems from the transferability and discriminability of the source domain. Consequently, ensuring those properties in source samples, regardless of their origin (e.g., image, text, noise), can enhance the effectiveness of knowledge transfer in SHDA tasks. The codes and datasets are available at https://github.com/yyyaoyuan/SHDA.
