Table of Contents
Fetching ...

Domain Adaptation Using Pseudo Labels

Sachin Chhabra, Hemanth Venkateswara, Baoxin Li

TL;DR

This paper tackles unsupervised domain adaptation by addressing category misalignment that arises from marginal distribution alignment. It introduces DAPL, a simple pipeline that generates target pseudo labels via a Gaussian Mixture-based feature space, then progressively refines and filters them through Confidence, Conformity, and Consistency criteria before using them as supervision to adapt a source classifier. A per-epoch target supervision schedule and a time-varying loss weight enable gradual domain adaptation, yielding competitive results across Digits, VisDA, and Office-Home benchmarks while avoiding heavy domain-alignment losses. The findings show that high-quality pseudo labels, obtained through principled filtering, can match or exceed outcomes from more complex domain-alignment techniques, with practical impact for robust, data-efficient domain adaptation. The approach also highlights the importance of limiting confirmation bias and suggests avenues for future refinement when target data are scarce.

Abstract

In the absence of labeled target data, unsupervised domain adaptation approaches seek to align the marginal distributions of the source and target domains in order to train a classifier for the target. Unsupervised domain alignment procedures are category-agnostic and end up misaligning the categories. We address this problem by deploying a pretrained network to determine accurate labels for the target domain using a multi-stage pseudo-label refinement procedure. The filters are based on the confidence, distance (conformity), and consistency of the pseudo labels. Our results on multiple datasets demonstrate the effectiveness of our simple procedure in comparison with complex state-of-the-art techniques.

Domain Adaptation Using Pseudo Labels

TL;DR

This paper tackles unsupervised domain adaptation by addressing category misalignment that arises from marginal distribution alignment. It introduces DAPL, a simple pipeline that generates target pseudo labels via a Gaussian Mixture-based feature space, then progressively refines and filters them through Confidence, Conformity, and Consistency criteria before using them as supervision to adapt a source classifier. A per-epoch target supervision schedule and a time-varying loss weight enable gradual domain adaptation, yielding competitive results across Digits, VisDA, and Office-Home benchmarks while avoiding heavy domain-alignment losses. The findings show that high-quality pseudo labels, obtained through principled filtering, can match or exceed outcomes from more complex domain-alignment techniques, with practical impact for robust, data-efficient domain adaptation. The approach also highlights the importance of limiting confirmation bias and suggests avenues for future refinement when target data are scarce.

Abstract

In the absence of labeled target data, unsupervised domain adaptation approaches seek to align the marginal distributions of the source and target domains in order to train a classifier for the target. Unsupervised domain alignment procedures are category-agnostic and end up misaligning the categories. We address this problem by deploying a pretrained network to determine accurate labels for the target domain using a multi-stage pseudo-label refinement procedure. The filters are based on the confidence, distance (conformity), and consistency of the pseudo labels. Our results on multiple datasets demonstrate the effectiveness of our simple procedure in comparison with complex state-of-the-art techniques.
Paper Structure (20 sections, 6 equations, 5 figures, 5 tables)

This paper contains 20 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Visualization of image features using t-SNE. The source domain is represented using $\bullet$ and the target domain using $\mathbf{\times}$. Different colors depict different categories. Best viewed in color. (a) ResNet101 trained on ImageNet (b) Pretrained ResNet101 over-fitted with the source domain (c) DANN ganin2016domain (d) Trained on source and target domain (e) Our method.
  • Figure 2: Our proposed framework. Source and target samples are input to the network. The network is trained on source data in supervised way. For target adaptation, we train the network on the target pseudo labels. The obtained pseudo labels for the target are filtered to get the the most precise subset using Confidence, Conformity and Consistency filters before they are promoted to the labeled dataset to be used for supervised training. Confidence filter validates the sample based on their output confidences. Conformity filter verifies that the sample lies within the Gaussian region. Consistency filter ensures that the sample has been consistent with its output. Best viewed in color.
  • Figure 3: Confusion matrix after different filters for MNIST$\rightarrow$SVHN at $90\%$ training. (a) Unfiltered samples selected using Target Supervision Schedule, (b) after Confidence filter, (c) after Conformity filter, (d) after Consistency filter. The numbers at the top indicate target classification accuracy with the number of target pseudo labels within parentheses.
  • Figure 4: Training plots for Real-World $\rightarrow$ Clipart experiment from Office-Home dataset. Unfiltered-Acc and Filtered-Acc are accuracies of pseudo labels without filtering (after Target Supervision Schedule) and with DAPL filtering, respectively. Tgt-Acc is the achieved target accuracy with DAPL. Unfiltered-Count and Filtered-Count are the number of selected pseudo labels without filtering and with DAPL filtering, respectively.
  • Figure 5: Visualization of image features using t-SNE. The source domain is represented using $\bullet$ and the target domain using $\mathbf{\times}$. Different colors depict different categories. Left column represents the source only features and the right column displays features after our approach. Best viewed in color.