Table of Contents
Fetching ...

Overcoming Data Inequality across Domains with Semi-Supervised Domain Generalization

Jinha Park, Wonguk Cho, Taesup Kim

TL;DR

A novel algorithm is proposed, ProUD, which can effectively learn domain-invariant features via domain-aware prototypes along with progressive generalization via uncertainty-adaptive mixing of labeled and unlabeled domains.

Abstract

While there have been considerable advancements in machine learning driven by extensive datasets, a significant disparity still persists in the availability of data across various sources and populations. This inequality across domains poses challenges in modeling for those with limited data, which can lead to profound practical and ethical concerns. In this paper, we address a representative case of data inequality problem across domains termed Semi-Supervised Domain Generalization (SSDG), in which only one domain is labeled while the rest are unlabeled. We propose a novel algorithm, ProUD, which can effectively learn domain-invariant features via domain-aware prototypes along with progressive generalization via uncertainty-adaptive mixing of labeled and unlabeled domains. Our experiments on three different benchmark datasets demonstrate the effectiveness of ProUD, outperforming all baseline models including single domain generalization and semi-supervised learning. Source code will be released upon acceptance of the paper.

Overcoming Data Inequality across Domains with Semi-Supervised Domain Generalization

TL;DR

A novel algorithm is proposed, ProUD, which can effectively learn domain-invariant features via domain-aware prototypes along with progressive generalization via uncertainty-adaptive mixing of labeled and unlabeled domains.

Abstract

While there have been considerable advancements in machine learning driven by extensive datasets, a significant disparity still persists in the availability of data across various sources and populations. This inequality across domains poses challenges in modeling for those with limited data, which can lead to profound practical and ethical concerns. In this paper, we address a representative case of data inequality problem across domains termed Semi-Supervised Domain Generalization (SSDG), in which only one domain is labeled while the rest are unlabeled. We propose a novel algorithm, ProUD, which can effectively learn domain-invariant features via domain-aware prototypes along with progressive generalization via uncertainty-adaptive mixing of labeled and unlabeled domains. Our experiments on three different benchmark datasets demonstrate the effectiveness of ProUD, outperforming all baseline models including single domain generalization and semi-supervised learning. Source code will be released upon acceptance of the paper.
Paper Structure (24 sections, 7 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 7 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: ProUD training examples with three different domains: one labeled domain (highlighted in yellow) and two unlabeled domains (indicated in blue and red). For the blue unlabeled domain, DaPP accurately infers a correct pseudo-label and a substantial portion of the unlabeled domain image ($\lambda_1=0.6$) is mixed with the labeled domain image through UDMix. This results in a domain-mixed image depicted in green. In contrast, for the red unlabeled domain, an incorrect pseudo-label is generated, yet only a minimal fraction of this image ($\lambda_2=0.1$) is mixed with the labeled domain image through UDMix, leading to a predominantly yellow domain-mixed image.
  • Figure 2: t-SNE visualizations of the learned representations of both samples and domain-aware prototypes using the PACS dataset, in the case where P is the labeled source domain, and C and S are the unlabeled source domains. Different colors and shapes represent distinct classes and domains, respectively. (a), (b), and (c) are produced right after the DaPP at epochs 1, 40, and 80, respectively.
  • Figure 3: Training curves of ProUD on the PACS dataset, with labeled source domain C, unlabeled source domains A and P, and the test domain S. Lambda and PL accuracy respectively represent the average value of the mixing ratio $\lambda$ and the accuracy of pseudo-labels across all samples within each of the unlabeled domains.
  • Figure 4: t-SNE visualizations of the learned representations of both samples and domain-aware prototypes without PML. P is the labeled source domain, and C and S are the unlabeled source domains from the PACS dataset. Different colors and shapes represent distinct classes and domains, respectively. (a), (b), and (c) are produced right after the DaPP at epochs 1, 40, and 80, respectively.