Table of Contents
Fetching ...

Unlocking the Potential of Unlabeled Data in Semi-Supervised Domain Generalization

Dongkwan Lee, Kyomin Hwang, Nojun Kwak

TL;DR

This work tackles SSDG by addressing the underutilization of unconfident-unlabeled data. It introduces UPCSC, a plug-and-play framework with two contrastive learning modules—Unlabeled Proxy-based Contrastive Learning (UPC) and Surrogate Class Learning (SC)—to exploit all unlabeled samples without domain labels. Through extensive experiments on four SSDG benchmarks, UPCSC yields consistent improvements over strong SSL baselines and competing plug-and-play methods, and analyses show enhanced class-discriminability and reduced domain gaps. The approach demonstrates the practical potential of leveraging the full unlabeled data in domain-generalization under label scarcity, with public code available for replication.

Abstract

We address the problem of semi-supervised domain generalization (SSDG), where the distributions of train and test data differ, and only a small amount of labeled data along with a larger amount of unlabeled data are available during training. Existing SSDG methods that leverage only the unlabeled samples for which the model's predictions are highly confident (confident-unlabeled samples), limit the full utilization of the available unlabeled data. To the best of our knowledge, we are the first to explore a method for incorporating the unconfident-unlabeled samples that were previously disregarded in SSDG setting. To this end, we propose UPCSC to utilize these unconfident-unlabeled samples in SSDG that consists of two modules: 1) Unlabeled Proxy-based Contrastive learning (UPC) module, treating unconfident-unlabeled samples as additional negative pairs and 2) Surrogate Class learning (SC) module, generating positive pairs for unconfident-unlabeled samples using their confusing class set. These modules are plug-and-play and do not require any domain labels, which can be easily integrated into existing approaches. Experiments on four widely used SSDG benchmarks demonstrate that our approach consistently improves performance when attached to baselines and outperforms competing plug-and-play methods. We also analyze the role of our method in SSDG, showing that it enhances class-level discriminability and mitigates domain gaps. The code is available at https://github.com/dongkwani/UPCSC.

Unlocking the Potential of Unlabeled Data in Semi-Supervised Domain Generalization

TL;DR

This work tackles SSDG by addressing the underutilization of unconfident-unlabeled data. It introduces UPCSC, a plug-and-play framework with two contrastive learning modules—Unlabeled Proxy-based Contrastive Learning (UPC) and Surrogate Class Learning (SC)—to exploit all unlabeled samples without domain labels. Through extensive experiments on four SSDG benchmarks, UPCSC yields consistent improvements over strong SSL baselines and competing plug-and-play methods, and analyses show enhanced class-discriminability and reduced domain gaps. The approach demonstrates the practical potential of leveraging the full unlabeled data in domain-generalization under label scarcity, with public code available for replication.

Abstract

We address the problem of semi-supervised domain generalization (SSDG), where the distributions of train and test data differ, and only a small amount of labeled data along with a larger amount of unlabeled data are available during training. Existing SSDG methods that leverage only the unlabeled samples for which the model's predictions are highly confident (confident-unlabeled samples), limit the full utilization of the available unlabeled data. To the best of our knowledge, we are the first to explore a method for incorporating the unconfident-unlabeled samples that were previously disregarded in SSDG setting. To this end, we propose UPCSC to utilize these unconfident-unlabeled samples in SSDG that consists of two modules: 1) Unlabeled Proxy-based Contrastive learning (UPC) module, treating unconfident-unlabeled samples as additional negative pairs and 2) Surrogate Class learning (SC) module, generating positive pairs for unconfident-unlabeled samples using their confusing class set. These modules are plug-and-play and do not require any domain labels, which can be easily integrated into existing approaches. Experiments on four widely used SSDG benchmarks demonstrate that our approach consistently improves performance when attached to baselines and outperforms competing plug-and-play methods. We also analyze the role of our method in SSDG, showing that it enhances class-level discriminability and mitigates domain gaps. The code is available at https://github.com/dongkwani/UPCSC.

Paper Structure

This paper contains 28 sections, 7 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Visual illustration of sample usage differences between previous works and our method in the SSDG task.
  • Figure 2: Distribution of the number of classes that unconfident-unlabeled samples are confusing, according to Table \ref{['table:observation_acc']}. We define confusing classes as those of which confidence exceeds the random chance threshold (1/number of classes). Notably, the model tends to confuse samples among only a small subset of classes.
  • Figure 3: Overview of our UPCSC algorithm. UPCSC is a plug-and-play module designed to be implemented atop SSL-based SSDG methods. To fully leverage unlabeled data in the SSDG setting, we propose two novel learning methods: Unlabeld Proxy-based Contrast learning (UPC) and Surrogate Class learning (SC).
  • Figure 4: High level idea of our method and terminology
  • Figure 5: Average accuracy of unlabeled samples from the source domain in the PACS 10 labels per class setting. Note that we do not use any test target domain dataset for calculating accuracy.
  • ...and 4 more figures