Table of Contents
Fetching ...

Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable Transfer from Black-Box to Lightweight Segmentation Model

Claudia Cuttano, Antonio Tavera, Fabio Cermelli, Giuseppe Averta, Barbara Caputo

TL;DR

The paper addresses the practical challenge of transferring knowledge from a black-box semantic segmentation model to a lightweight target model using only unlabeled target data, without access to source data. It introduces CoRTe, which combines Robust Relative Confidence Pseudo-Labelling to extract reliable pseudo-labels, a target-domain pseudo-label refinement via an EMA-based teacher, and a consistency-regularized training scheme with strong augmentations. Empirical results on GTA5→Cityscapes and SYNTHIA→Cityscapes show CoRTe outperforming baseline black-box methods and approaching or matching the Target-only upper bound in several cases, illustrating effective domain transfer under restrictive data access. The approach enables reliable, efficient deployment of segmentation models on edge devices where source data and transparent models are not accessible, with practical impact for privacy-preserving and commercial settings.

Abstract

Many practical applications require training of semantic segmentation models on unlabelled datasets and their execution on low-resource hardware. Distillation from a trained source model may represent a solution for the first but does not account for the different distribution of the training data. Unsupervised domain adaptation (UDA) techniques claim to solve the domain shift, but in most cases assume the availability of the source data or an accessible white-box source model, which in practical applications are often unavailable for commercial and/or safety reasons. In this paper, we investigate a more challenging setting in which a lightweight model has to be trained on a target unlabelled dataset for semantic segmentation, under the assumption that we have access only to black-box source model predictions. Our method, named CoRTe, consists of (i) a pseudo-labelling function that extracts reliable knowledge from the black-box source model using its relative confidence, (ii) a pseudo label refinement method to retain and enhance the novel information learned by the student model on the target data, and (iii) a consistent training of the model using the extracted pseudo labels. We benchmark CoRTe on two synthetic-to-real settings, demonstrating remarkable results when using black-box models to transfer knowledge on lightweight models for a target data distribution.

Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable Transfer from Black-Box to Lightweight Segmentation Model

TL;DR

The paper addresses the practical challenge of transferring knowledge from a black-box semantic segmentation model to a lightweight target model using only unlabeled target data, without access to source data. It introduces CoRTe, which combines Robust Relative Confidence Pseudo-Labelling to extract reliable pseudo-labels, a target-domain pseudo-label refinement via an EMA-based teacher, and a consistency-regularized training scheme with strong augmentations. Empirical results on GTA5→Cityscapes and SYNTHIA→Cityscapes show CoRTe outperforming baseline black-box methods and approaching or matching the Target-only upper bound in several cases, illustrating effective domain transfer under restrictive data access. The approach enables reliable, efficient deployment of segmentation models on edge devices where source data and transparent models are not accessible, with practical impact for privacy-preserving and commercial settings.

Abstract

Many practical applications require training of semantic segmentation models on unlabelled datasets and their execution on low-resource hardware. Distillation from a trained source model may represent a solution for the first but does not account for the different distribution of the training data. Unsupervised domain adaptation (UDA) techniques claim to solve the domain shift, but in most cases assume the availability of the source data or an accessible white-box source model, which in practical applications are often unavailable for commercial and/or safety reasons. In this paper, we investigate a more challenging setting in which a lightweight model has to be trained on a target unlabelled dataset for semantic segmentation, under the assumption that we have access only to black-box source model predictions. Our method, named CoRTe, consists of (i) a pseudo-labelling function that extracts reliable knowledge from the black-box source model using its relative confidence, (ii) a pseudo label refinement method to retain and enhance the novel information learned by the student model on the target data, and (iii) a consistent training of the model using the extracted pseudo labels. We benchmark CoRTe on two synthetic-to-real settings, demonstrating remarkable results when using black-box models to transfer knowledge on lightweight models for a target data distribution.
Paper Structure (12 sections, 6 equations, 6 figures, 3 tables)

This paper contains 12 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: With CoRTe we can train a low-resources model with unlabelled target data extracting knowledge from a pre-trained source model accessible via input-output API. During the knowledge transfer, neither the source data nor the source model is accessible.
  • Figure 2: An overview of the proposed CoRTe framework. Our approach involves using a teacher model $\mathcal{S}$ to predict labels for an unlabelled target image $x$. To filter the predictions, we introduce a Robust Relative Confidence Pseudo-Labelling method that preserves pixels where the relative confidence of the model is above a threshold. The resulting pseudo label $\mathcal{M}$ is then further refined using the Label Self-Refinement technique, which leverages the knowledge gained by the student lightweight model. Finally, the refined pseudo label $\mathcal{M_R}$ is used as the ground truth for training.
  • Figure 3: In this example, we demonstrate the Robust Relative Confidence Pseudo-Labelling strategy (R$^2$CP) which begins by extracting the Relative Confidence Map $\mathcal{I}_x$ from the source model's prediction on the target image $\mathcal{S}(x)$. The Relative Confidence (RC) is computed using \ref{['eq:relative_conf']} for each pixel. Finally, we apply \ref{['eq:rrcp']} to identify the set of reliable pixels to be retained in the Pseudo Label $\mathcal{M}$.
  • Figure 4: Graphical interpretation of the label generation process. At the top left, a target image from Cityscapes (a) and its corresponding label (b). We query the teacher model and obtain its prediction for the target image (c). Our robust pseudo-labelling module exploits the relative confidence of the teacher model to filter out the uncertain pixels (black in d). During the training process, the increasing knowledge of the student on the target data is used to automatically refine the pseudo label (e) resulting in the final label for training the model (f).
  • Figure 5: Self-label refinement. Visual representation of our refined pseudo label used at different steps of the training (from left to right at 0, 1.5k, 5k, and 80k steps respectively). Intuitively, at the very beginning of the training, the teacher's prediction is filtered from the uncertain pixels ad used to train the student. During training, this latter gradually increases its confidence and its own predictions can be used to refine the pseudo label with our Label Self-Refinement module.
  • ...and 1 more figures