The Last Mile to Supervised Performance: Semi-Supervised Domain Adaptation for Semantic Segmentation
Daniel Morales-Brotons, Grigorios Chrysos, Stratis Tzoumas, Volkan Cevher
TL;DR
This work investigates semi-supervised domain adaptation (SSDA) for semantic segmentation, addressing the challenge of achieving supervised-level performance with minimal target annotations. It proposes a simple, effective framework that combines consistency regularization, supervised pixel contrastive learning, and iterative self-training within a mean-teacher setup to leverage source labels, target unlabeled data, and a small set of target labels. The approach achieves state-of-the-art results in GTA→Cityscapes for low-label regimes, with 100–200 target labels approaching or surpassing fully supervised performance and demonstrates generalization to Synthia and BDD scenarios. It also analyzes how existing UDA and SSL methods perform in SSDA, offering design guidelines that prioritize tight clustering of target representations and domain robustness over explicit domain alignment. The findings highlight the practical value of SSDA for reducing annotation costs while delivering near-supervised performance in dense predictive tasks.
Abstract
Supervised deep learning requires massive labeled datasets, but obtaining annotations is not always easy or possible, especially for dense tasks like semantic segmentation. To overcome this issue, numerous works explore Unsupervised Domain Adaptation (UDA), which uses a labeled dataset from another domain (source), or Semi-Supervised Learning (SSL), which trains on a partially labeled set. Despite the success of UDA and SSL, reaching supervised performance at a low annotation cost remains a notoriously elusive goal. To address this, we study the promising setting of Semi-Supervised Domain Adaptation (SSDA). We propose a simple SSDA framework that combines consistency regularization, pixel contrastive learning, and self-training to effectively utilize a few target-domain labels. Our method outperforms prior art in the popular GTA-to-Cityscapes benchmark and shows that as little as 50 target labels can suffice to achieve near-supervised performance. Additional results on Synthia-to-Cityscapes, GTA-to-BDD and Synthia-to-BDD further demonstrate the effectiveness and practical utility of the method. Lastly, we find that existing UDA and SSL methods are not well-suited for the SSDA setting and discuss design patterns to adapt them.
