Table of Contents
Fetching ...

Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

Yuan Yao, Xiaopu Zhang, Yu Zhang, Jian Jin, Qiang Yang

TL;DR

This work tackles SHDA, where source and target domains have different feature representations and no sample-wise correspondences, with labeled sources and largely unlabeled targets. Through extensive 330-task experiments, the authors show that source category information and source feature content have limited influence on target performance, and surprisingly that noise drawn from simple distributions can carry transferable knowledge. They introduce the Knowledge Transfer Framework (KTF) to quantify transferable knowledge via a common subspace, and demonstrate that the transferability and discriminability of the source domain are the key drivers of improvement in the target domain, supported by visualization analyses. The findings suggest practical SHDA strategies that leverage source noise or ensure source-domain discriminability/transferability, with implications for privacy-aware and source-free adaptation, and point to future work on theoretical foundations.

Abstract

Semi-supervised heterogeneous domain adaptation (SHDA) addresses learning across domains with distinct feature representations and distributions, where source samples are labeled while most target samples are unlabeled, with only a small fraction labeled. Moreover, there is no one-to-one correspondence between source and target samples. Although various SHDA methods have been developed to tackle this problem, the nature of the knowledge transferred across heterogeneous domains remains unclear. This paper delves into this question from an empirical perspective. We conduct extensive experiments on about 330 SHDA tasks, employing two supervised learning methods and seven representative SHDA methods. Surprisingly, our observations indicate that both the category and feature information of source samples do not significantly impact the performance of the target domain. Additionally, noise drawn from simple distributions, when used as source samples, may contain transferable knowledge. Based on this insight, we perform a series of experiments to uncover the underlying principles of transferable knowledge in SHDA. Specifically, we design a unified Knowledge Transfer Framework (KTF) for SHDA. Based on the KTF, we find that the transferable knowledge in SHDA primarily stems from the transferability and discriminability of the source domain. Consequently, ensuring those properties in source samples, regardless of their origin (e.g., image, text, noise), can enhance the effectiveness of knowledge transfer in SHDA tasks. The codes and datasets are available at https://github.com/yyyaoyuan/SHDA.

Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

TL;DR

This work tackles SHDA, where source and target domains have different feature representations and no sample-wise correspondences, with labeled sources and largely unlabeled targets. Through extensive 330-task experiments, the authors show that source category information and source feature content have limited influence on target performance, and surprisingly that noise drawn from simple distributions can carry transferable knowledge. They introduce the Knowledge Transfer Framework (KTF) to quantify transferable knowledge via a common subspace, and demonstrate that the transferability and discriminability of the source domain are the key drivers of improvement in the target domain, supported by visualization analyses. The findings suggest practical SHDA strategies that leverage source noise or ensure source-domain discriminability/transferability, with implications for privacy-aware and source-free adaptation, and point to future work on theoretical foundations.

Abstract

Semi-supervised heterogeneous domain adaptation (SHDA) addresses learning across domains with distinct feature representations and distributions, where source samples are labeled while most target samples are unlabeled, with only a small fraction labeled. Moreover, there is no one-to-one correspondence between source and target samples. Although various SHDA methods have been developed to tackle this problem, the nature of the knowledge transferred across heterogeneous domains remains unclear. This paper delves into this question from an empirical perspective. We conduct extensive experiments on about 330 SHDA tasks, employing two supervised learning methods and seven representative SHDA methods. Surprisingly, our observations indicate that both the category and feature information of source samples do not significantly impact the performance of the target domain. Additionally, noise drawn from simple distributions, when used as source samples, may contain transferable knowledge. Based on this insight, we perform a series of experiments to uncover the underlying principles of transferable knowledge in SHDA. Specifically, we design a unified Knowledge Transfer Framework (KTF) for SHDA. Based on the KTF, we find that the transferable knowledge in SHDA primarily stems from the transferability and discriminability of the source domain. Consequently, ensuring those properties in source samples, regardless of their origin (e.g., image, text, noise), can enhance the effectiveness of knowledge transfer in SHDA tasks. The codes and datasets are available at https://github.com/yyyaoyuan/SHDA.

Paper Structure

This paper contains 35 sections, 10 equations, 20 figures, 3 tables.

Figures (20)

  • Figure 1: Example scenario of SHDA with a textual source domain and a visual target domain. Here, all texts are labeled, but most images remain unlabeled, with only a small number having labels. Also, there is no one-to-one relationship between texts and images. We do not know what knowledge is transferred across heterogeneous domains.
  • Figure 2: Experimental results on the NUS-WIDE+ImageNet-8 dataset Chua2009NUS-WIDEDeng2009ImageNet, which demonstrates that noise may contain transferable knowledge. Here, Text $\rightarrow$ Image is a vanilla SHDA task, whilst Noise $\rightarrow$ Image is a specialized SHDA task with pure noise as the source sample. In addition, SVMt and NNt are two supervised learning methods, whereas SHFA, CDLS, DDACL, TNT, STN, SSAN, and JMEA are seven SHDA methods.
  • Figure 3: In general, the SHDA pipeline integrates the classification adaptation and distribution alignment mechanisms to jointly learn the source and target feature projectors, along with the classifier, from scratch in a semi-supervised manner. Notably, the feature projectors are unique to each domain.
  • Figure 4: An illustration of the category-permutated SHDA task, where source and target samples have identical categories but with different orders of category indices.
  • Figure 5: The orders of category indices for source and target samples on all datasets. Here, we preserve the order of category indices for target samples while exclusively modifying that of source samples. Consequently, the task is considered as a vanilla SHDA task only when the category indices of both source and target samples are aligned in order 1.
  • ...and 15 more figures

Theorems & Definitions (1)

  • Definition 1