Self-ensembling for visual domain adaptation
Geoffrey French, Michal Mackiewicz, Mark Fisher
TL;DR
This work extends self-ensembling via the mean teacher framework to visual domain adaptation, addressing the challenge of transferring knowledge from labeled source data to unlabeled target data. It introduces a two-path architecture with domain-specific batch normalization, plus strategies such as confidence thresholding, targeted data augmentation, and a class balance loss to stabilize training and counteract imbalanced target distributions. The method achieves state-of-the-art results on multiple small-image benchmarks and wins the VisDA-2017 challenge, often approaching supervised performance on digits datasets. Overall, the approach demonstrates that distribution alignment followed by robust, self-ensembled refinement can yield strong, architecture-agnostic domain adaptation performance.
Abstract
This paper explores the use of self-ensembling for visual domain adaptation problems. Our technique is derived from the mean teacher variant (Tarvainen et al., 2017) of temporal ensembling (Laine et al;, 2017), a technique that achieved state of the art results in the area of semi-supervised learning. We introduce a number of modifications to their approach for challenging domain adaptation scenarios and evaluate its effectiveness. Our approach achieves state of the art results in a variety of benchmarks, including our winning entry in the VISDA-2017 visual domain adaptation challenge. In small image benchmarks, our algorithm not only outperforms prior art, but can also achieve accuracy that is close to that of a classifier trained in a supervised fashion.
