Table of Contents
Fetching ...

ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain Adaptive Semantic Segmentation

Erik Brorsson, Knut Åkesson, Lennart Svensson, Kristofer Bengtsson

TL;DR

This work tackles pseudo-label noise in unsupervised domain adaptation for semantic segmentation by introducing ECAP, an extensive cut-and-paste augmentation that leverages a memory bank of high-confidence pseudo-labeled target patches. ECAP builds per-class memory banks, selects high-confidence samples via a class-aware sampler, and constructs composite training images by pasting target content into source contexts before a DACS-style mixing step. Empirically, ECAP improves MIC on GTA→Cityscapes and Synthia→Cityscapes (notably reaching 69.1 mIoU on the latter) while revealing limitations in low-visibility domains where context cues are critical. Overall, ECAP demonstrates that memory-based, multi-sample augmentation can mitigate pseudo-label noise and push SOTA in synthetic-to-real UDA, with caveats about context reliance and domain characteristics.

Abstract

We consider unsupervised domain adaptation (UDA) for semantic segmentation in which the model is trained on a labeled source dataset and adapted to an unlabeled target dataset. Unfortunately, current self-training methods are susceptible to misclassified pseudo-labels resulting from erroneous predictions. Since certain classes are typically associated with less reliable predictions in UDA, reducing the impact of such pseudo-labels without skewing the training towards some classes is notoriously difficult. To this end, we propose an extensive cut-and-paste strategy (ECAP) to leverage reliable pseudo-labels through data augmentation. Specifically, ECAP maintains a memory bank of pseudo-labeled target samples throughout training and cut-and-pastes the most confident ones onto the current training batch. We implement ECAP on top of the recent method MIC and boost its performance on two synthetic-to-real domain adaptation benchmarks. Notably, MIC+ECAP reaches an unprecedented performance of 69.1 mIoU on the Synthia->Cityscapes benchmark. Our code is available at https://github.com/ErikBrorsson/ECAP.

ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain Adaptive Semantic Segmentation

TL;DR

This work tackles pseudo-label noise in unsupervised domain adaptation for semantic segmentation by introducing ECAP, an extensive cut-and-paste augmentation that leverages a memory bank of high-confidence pseudo-labeled target patches. ECAP builds per-class memory banks, selects high-confidence samples via a class-aware sampler, and constructs composite training images by pasting target content into source contexts before a DACS-style mixing step. Empirically, ECAP improves MIC on GTA→Cityscapes and Synthia→Cityscapes (notably reaching 69.1 mIoU on the latter) while revealing limitations in low-visibility domains where context cues are critical. Overall, ECAP demonstrates that memory-based, multi-sample augmentation can mitigate pseudo-label noise and push SOTA in synthetic-to-real UDA, with caveats about context reliance and domain characteristics.

Abstract

We consider unsupervised domain adaptation (UDA) for semantic segmentation in which the model is trained on a labeled source dataset and adapted to an unlabeled target dataset. Unfortunately, current self-training methods are susceptible to misclassified pseudo-labels resulting from erroneous predictions. Since certain classes are typically associated with less reliable predictions in UDA, reducing the impact of such pseudo-labels without skewing the training towards some classes is notoriously difficult. To this end, we propose an extensive cut-and-paste strategy (ECAP) to leverage reliable pseudo-labels through data augmentation. Specifically, ECAP maintains a memory bank of pseudo-labeled target samples throughout training and cut-and-pastes the most confident ones onto the current training batch. We implement ECAP on top of the recent method MIC and boost its performance on two synthetic-to-real domain adaptation benchmarks. Notably, MIC+ECAP reaches an unprecedented performance of 69.1 mIoU on the Synthia->Cityscapes benchmark. Our code is available at https://github.com/ErikBrorsson/ECAP.
Paper Structure (18 sections, 9 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 9 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Schematic illustration of ECAP, constituting a memory bank, a sampler, and an augmentation module, integrated with the self-training framework. The input to ECAP is a source and target image, $x^S$ and $x^T$, along with the corresponding label $y^S$ and pseudo-label $\tilde{y}^T$, which is produced by the teacher $g_{\Phi}$. This input is processed by the augmentation module together with samples from the memory bank to generate a mixed image $x^M$, an associated label $y^M$, and a weight $q^M$. Simultaneously, $x^T$ and $\tilde{y}^T$ are added to the memory bank for future use. The student network $f_{\theta}$ processes both the source image $x^S$ and the mixed image $x^M$ and is supervised by the loss $\mathcal{L}^S(y^{S}, f_{\theta}(x^S)) + \mathcal{L}^T(y^M, f_{\theta}(x^M), q^M )$ during training.
  • Figure 2: Qualitative comparison of MIC and MIC$\dagger$+ECAP on Synthia$\rightarrow$Cityscapes (row 1), as well as MIC and MIC+ECAP on GTA$\rightarrow$Cityscapes (row 2), Cityscapes$\rightarrow$DarkZurich (row 3), and Cityscapes$\rightarrow$ACDC (row 4).
  • Figure 3: Predictions on Cityscapes validation images following training on Synthia$\rightarrow$Cityscapes (row 1) and GTA$\rightarrow$Cityscapes (row 2).
  • Figure 4: The five most confident samples of the classes traffic sign (row 1), rider (row 2), and bus (row 3) in the ECAP memory bank. The samples are present in the memory bank at the end of training in the MIC+ECAP run on Synthia$\rightarrow$Cityscapes with median performance.
  • Figure 5: Two images (row 1) in the memory bank of class rider that has been assigned inaccurate pseudo-labels (row 2) during the MIC+ECAP run on GTA$\rightarrow$Cityscapes with median performance.
  • ...and 1 more figures