Table of Contents
Fetching ...

Self-ensembling for visual domain adaptation

Geoffrey French, Michal Mackiewicz, Mark Fisher

TL;DR

This work extends self-ensembling via the mean teacher framework to visual domain adaptation, addressing the challenge of transferring knowledge from labeled source data to unlabeled target data. It introduces a two-path architecture with domain-specific batch normalization, plus strategies such as confidence thresholding, targeted data augmentation, and a class balance loss to stabilize training and counteract imbalanced target distributions. The method achieves state-of-the-art results on multiple small-image benchmarks and wins the VisDA-2017 challenge, often approaching supervised performance on digits datasets. Overall, the approach demonstrates that distribution alignment followed by robust, self-ensembled refinement can yield strong, architecture-agnostic domain adaptation performance.

Abstract

This paper explores the use of self-ensembling for visual domain adaptation problems. Our technique is derived from the mean teacher variant (Tarvainen et al., 2017) of temporal ensembling (Laine et al;, 2017), a technique that achieved state of the art results in the area of semi-supervised learning. We introduce a number of modifications to their approach for challenging domain adaptation scenarios and evaluate its effectiveness. Our approach achieves state of the art results in a variety of benchmarks, including our winning entry in the VISDA-2017 visual domain adaptation challenge. In small image benchmarks, our algorithm not only outperforms prior art, but can also achieve accuracy that is close to that of a classifier trained in a supervised fashion.

Self-ensembling for visual domain adaptation

TL;DR

This work extends self-ensembling via the mean teacher framework to visual domain adaptation, addressing the challenge of transferring knowledge from labeled source data to unlabeled target data. It introduces a two-path architecture with domain-specific batch normalization, plus strategies such as confidence thresholding, targeted data augmentation, and a class balance loss to stabilize training and counteract imbalanced target distributions. The method achieves state-of-the-art results on multiple small-image benchmarks and wins the VisDA-2017 challenge, often approaching supervised performance on digits datasets. Overall, the approach demonstrates that distribution alignment followed by robust, self-ensembled refinement can yield strong, architecture-agnostic domain adaptation performance.

Abstract

This paper explores the use of self-ensembling for visual domain adaptation problems. Our technique is derived from the mean teacher variant (Tarvainen et al., 2017) of temporal ensembling (Laine et al;, 2017), a technique that achieved state of the art results in the area of semi-supervised learning. We introduce a number of modifications to their approach for challenging domain adaptation scenarios and evaluate its effectiveness. Our approach achieves state of the art results in a variety of benchmarks, including our winning entry in the VISDA-2017 visual domain adaptation challenge. In small image benchmarks, our algorithm not only outperforms prior art, but can also achieve accuracy that is close to that of a classifier trained in a supervised fashion.

Paper Structure

This paper contains 22 sections, 1 equation, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Images from the VisDA-17 domain adaptation challenge
  • Figure 2: The network structures of the original mean teacher model and our model. Dashed lines in the mean teacher model indicate that ground truth labels -- and therefore cross-entropy classification loss -- are only available for labeled samples.
  • Figure 3: Small image domain adaptation example images