Table of Contents
Fetching ...

Self-training via Metric Learning for Source-Free Domain Adaptation of Semantic Segmentation

Ibrahim Batuhan Akkaya, Ugur Halici

TL;DR

This paper tackles semantic segmentation under source-free domain adaptation, where access to source data is restricted. It introduces Self-Training via Metric-learning (STvM), a mean-teacher framework augmented with a target-domain reliability metric learned through proxy-based metric learning, and a metric-guided online ClassMix (MOCM) augmentation. Reliability scores scale pseudo-label gradients and govern patch-based mixing, enabling the student to learn from all predictions while mitigating erroneous supervision. Across GTA5-to-CityScapes, SYNTHIA-to-CityScapes, and Cityscapes-to-NTHU, STvM consistently outperforms state-of-the-art SFDA methods and shows robust behavior under varying hyper-parameters and random seeds. The combination of a learned reliability metric and MOCM provides a principled approach to suppress noise and enhance supervision in a source-free setting, with practical impact for privacy-preserving semantic segmentation tasks.

Abstract

Unsupervised source-free domain adaptation methods aim to train a model for the target domain utilizing a pretrained source-domain model and unlabeled target-domain data, particularly when accessibility to source data is restricted due to intellectual property or privacy concerns. Traditional methods usually use self-training with pseudo-labeling, which is often subjected to thresholding based on prediction confidence. However, such thresholding limits the effectiveness of self-training due to insufficient supervision. This issue becomes more severe in a source-free setting, where supervision comes solely from the predictions of the pre-trained source model. In this study, we propose a novel approach by incorporating a mean-teacher model, wherein the student network is trained using all predictions from the teacher network. Instead of employing thresholding on predictions, we introduce a method to weight the gradients calculated from pseudo-labels based on the reliability of the teacher's predictions. To assess reliability, we introduce a novel approach using proxy-based metric learning. Our method is evaluated in synthetic-to-real and cross-city scenarios, demonstrating superior performance compared to existing state-of-the-art methods.

Self-training via Metric Learning for Source-Free Domain Adaptation of Semantic Segmentation

TL;DR

This paper tackles semantic segmentation under source-free domain adaptation, where access to source data is restricted. It introduces Self-Training via Metric-learning (STvM), a mean-teacher framework augmented with a target-domain reliability metric learned through proxy-based metric learning, and a metric-guided online ClassMix (MOCM) augmentation. Reliability scores scale pseudo-label gradients and govern patch-based mixing, enabling the student to learn from all predictions while mitigating erroneous supervision. Across GTA5-to-CityScapes, SYNTHIA-to-CityScapes, and Cityscapes-to-NTHU, STvM consistently outperforms state-of-the-art SFDA methods and shows robust behavior under varying hyper-parameters and random seeds. The combination of a learned reliability metric and MOCM provides a principled approach to suppress noise and enhance supervision in a source-free setting, with practical impact for privacy-preserving semantic segmentation tasks.

Abstract

Unsupervised source-free domain adaptation methods aim to train a model for the target domain utilizing a pretrained source-domain model and unlabeled target-domain data, particularly when accessibility to source data is restricted due to intellectual property or privacy concerns. Traditional methods usually use self-training with pseudo-labeling, which is often subjected to thresholding based on prediction confidence. However, such thresholding limits the effectiveness of self-training due to insufficient supervision. This issue becomes more severe in a source-free setting, where supervision comes solely from the predictions of the pre-trained source model. In this study, we propose a novel approach by incorporating a mean-teacher model, wherein the student network is trained using all predictions from the teacher network. Instead of employing thresholding on predictions, we introduce a method to weight the gradients calculated from pseudo-labels based on the reliability of the teacher's predictions. To assess reliability, we introduce a novel approach using proxy-based metric learning. Our method is evaluated in synthetic-to-real and cross-city scenarios, demonstrating superior performance compared to existing state-of-the-art methods.
Paper Structure (21 sections, 5 equations, 2 figures, 7 tables)

This paper contains 21 sections, 5 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Segmentation performance of the self-training when different percentile of the predictions are used as pseudo-labels
  • Figure 2: STvM comprises three networks, namely teacher, student, and metric network, represented as $\mathcal{T}$, $\mathcal{S}$, and $\mathcal{M}$, respectively. The teacher and the student network use the same segmentation network architecture. Each segmentation network is composed of feature extractor $\mathcal{F}$ and classifier $\mathcal{C}$. The metric network has the same architecture as $\mathcal{C}$, trained to learn metric feature space. Inspired by the mean-teacher approach, the student network is trained with a backpropagation algorithm. On the other hand, the parameters of the teacher model are updated with the moving average of the parameters of the student model. The metric network $\mathcal{M}$ and class proxies are also trained with a backpropagation algorithm.