Table of Contents
Fetching ...

Pseudolabel guided pixels contrast for domain adaptive semantic segmentation

Jianzi Xiang, Cailu Wan, Zhu Cao

TL;DR

A novel framework called Pseudo-label Guided Pixel Contrast (PGPC) is proposed, which overcomes the disadvantages of previous methods and can enhance the performance of other UDA approaches without increasing model complexity.

Abstract

Semantic segmentation is essential for comprehending images, but the process necessitates a substantial amount of detailed annotations at the pixel level. Acquiring such annotations can be costly in the real-world. Unsupervised domain adaptation (UDA) for semantic segmentation is a technique that uses virtual data with labels to train a model and adapts it to real data without labels. Some recent works use contrastive learning, which is a powerful method for self-supervised learning, to help with this technique. However, these works do not take into account the diversity of features within each class when using contrastive learning, which leads to errors in class prediction. We analyze the limitations of these works and propose a novel framework called Pseudo-label Guided Pixel Contrast (PGPC), which overcomes the disadvantages of previous methods. We also investigate how to use more information from target images without adding noise from pseudo-labels. We test our method on two standard UDA benchmarks and show that it outperforms existing methods. Specifically, we achieve relative improvements of 5.1% mIoU and 4.6% mIoU on the Grand Theft Auto V (GTA5) to Cityscapes and SYNTHIA to Cityscapes tasks based on DAFormer, respectively. Furthermore, our approach can enhance the performance of other UDA approaches without increasing model complexity. Code is available at https://github.com/embar111/pgpc

Pseudolabel guided pixels contrast for domain adaptive semantic segmentation

TL;DR

A novel framework called Pseudo-label Guided Pixel Contrast (PGPC) is proposed, which overcomes the disadvantages of previous methods and can enhance the performance of other UDA approaches without increasing model complexity.

Abstract

Semantic segmentation is essential for comprehending images, but the process necessitates a substantial amount of detailed annotations at the pixel level. Acquiring such annotations can be costly in the real-world. Unsupervised domain adaptation (UDA) for semantic segmentation is a technique that uses virtual data with labels to train a model and adapts it to real data without labels. Some recent works use contrastive learning, which is a powerful method for self-supervised learning, to help with this technique. However, these works do not take into account the diversity of features within each class when using contrastive learning, which leads to errors in class prediction. We analyze the limitations of these works and propose a novel framework called Pseudo-label Guided Pixel Contrast (PGPC), which overcomes the disadvantages of previous methods. We also investigate how to use more information from target images without adding noise from pseudo-labels. We test our method on two standard UDA benchmarks and show that it outperforms existing methods. Specifically, we achieve relative improvements of 5.1% mIoU and 4.6% mIoU on the Grand Theft Auto V (GTA5) to Cityscapes and SYNTHIA to Cityscapes tasks based on DAFormer, respectively. Furthermore, our approach can enhance the performance of other UDA approaches without increasing model complexity. Code is available at https://github.com/embar111/pgpc
Paper Structure (15 sections, 11 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 15 sections, 11 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: The schematic diagram of our motivation. (a) and (b) are the pixel-to-pixel and pixel-to-prototype contrastive learning methods, respectively. (c) is our contrastive learning method. (d) is the prediction confidence of the pixel marked with an orange cross. The adapted anchor pixel means the anchor pixel is trained by contrastive learning. The adapted anchor pixel is misplaced on the incorrect side of the decision boundary in (a) and (b), while (c) correctly positions it on the right side.
  • Figure 2: An overview of PGPC. The student model, guided by the fundamental segmentation loss $\mathcal{L}_s$, produces predictions on the source data. Concurrently, the teacher model provides an estimation of the pseudo-labels for the target data. Subsequently, the student model issues predictions on the target data, under the oversight of the weighted segmentation loss $\mathcal{L}_t$. In addition to the segmentation losses, the contrast loss $\mathcal{L}_c$ is introduced to encourage the alignment of pixel features in the embedding space.
  • Figure 3: An overview of pixels contrast framework. Reliable target pixel features are sampled from the target embedding space, followed by decreasing the distance between these features and class prototypes while concurrently increasing the distance between them and negative features. To fully leverage the pixels from target images, unreliable target pixel features are introduced into the contrastive learning during training.
  • Figure 4: Qualitative examples of semantic segmentation on the GTA5 to Cityscapess. (a) and (f) are the images and the ground truth labels from the Cityscapes validation dataset. (b), (c), and (d) are the segmentation predictions of SePiCo sepico, DAFormer daformer, and HRDA hrda. (e) are the segmentation outputs of our method. We emphasize areas of superior performance by our method with highlighted colored boxes.
  • Figure 5: The visualization of our method (PGPC) and DAFormer daformer. Different colors represent distinct categories, and these colors are consistent with the category label colors. We highlight some categories where our method performs better than DAFormer with colored circles.
  • ...and 1 more figures