Table of Contents
Fetching ...

PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation

Haoyu Xie, Changqi Wang, Jian Zhao, Yang Liu, Jun Dan, Chong Fu, Baigui Sun

TL;DR

A robust contrastive-based S4 framework, termed the Probabilistic Representation Contrastive Learning (PRCL) framework, to enhance the robustness of the unsupervised training process and introduces Global Distribution Prototypes (GDP) by gathering all PRs throughout the whole training process.

Abstract

Tremendous breakthroughs have been developed in Semi-Supervised Semantic Segmentation (S4) through contrastive learning. However, due to limited annotations, the guidance on unlabeled images is generated by the model itself, which inevitably exists noise and disturbs the unsupervised training process. To address this issue, we propose a robust contrastive-based S4 framework, termed the Probabilistic Representation Contrastive Learning (PRCL) framework to enhance the robustness of the unsupervised training process. We model the pixel-wise representation as Probabilistic Representations (PR) via multivariate Gaussian distribution and tune the contribution of the ambiguous representations to tolerate the risk of inaccurate guidance in contrastive learning. Furthermore, we introduce Global Distribution Prototypes (GDP) by gathering all PRs throughout the whole training process. Since the GDP contains the information of all representations with the same class, it is robust from the instant noise in representations and bears the intra-class variance of representations. In addition, we generate Virtual Negatives (VNs) based on GDP to involve the contrastive learning process. Extensive experiments on two public benchmarks demonstrate the superiority of our PRCL framework.

PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation

TL;DR

A robust contrastive-based S4 framework, termed the Probabilistic Representation Contrastive Learning (PRCL) framework, to enhance the robustness of the unsupervised training process and introduces Global Distribution Prototypes (GDP) by gathering all PRs throughout the whole training process.

Abstract

Tremendous breakthroughs have been developed in Semi-Supervised Semantic Segmentation (S4) through contrastive learning. However, due to limited annotations, the guidance on unlabeled images is generated by the model itself, which inevitably exists noise and disturbs the unsupervised training process. To address this issue, we propose a robust contrastive-based S4 framework, termed the Probabilistic Representation Contrastive Learning (PRCL) framework to enhance the robustness of the unsupervised training process. We model the pixel-wise representation as Probabilistic Representations (PR) via multivariate Gaussian distribution and tune the contribution of the ambiguous representations to tolerate the risk of inaccurate guidance in contrastive learning. Furthermore, we introduce Global Distribution Prototypes (GDP) by gathering all PRs throughout the whole training process. Since the GDP contains the information of all representations with the same class, it is robust from the instant noise in representations and bears the intra-class variance of representations. In addition, we generate Virtual Negatives (VNs) based on GDP to involve the contrastive learning process. Extensive experiments on two public benchmarks demonstrate the superiority of our PRCL framework.
Paper Structure (25 sections, 26 equations, 11 figures, 8 tables)

This paper contains 25 sections, 26 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Contradistinction between two types of representations and prototypes. Point prototype means the prototype of the deterministic representation and distribution prototype means the prototype of the probabilistic representation. Distinct from conventional representation, we introduce probability, and thus regarding representation as a multivariate Gaussian distribution. The probabilistic representation is able to demolish the ambiguity of representation prototype mapping to some extent and enhance the robustness of the model during training fuzzy pixels.
  • Figure 2: Our PRCL framework tackles the negative effect brought by prototype shift and fragmentary negative distribution (L1 and L2) with our proposed global distribution prototype and virtual negatives. (S1 and S2).
  • Figure 3: Overall framework of PRCL. The training pipeline contains two input streams: labeled images (black arrows) and unlabeled images (black dash arrows). In the pixel space, the model is guided by the combination of ground-truth $y^l$ and original pseudo-labels $y^u$. In the latent space, the model maps the pixels into probabilistic representations $\bm{z}\sim \mathcal{N}(\bm{\mu}, \bm{\sigma}^2)$ via two heads: $h(\cdot)$ and $p(\cdot)$. And the GDP is stored in a prototype-level dictionary and is updated with the local prototype. We generate the virtual negatives (VN, dashed cross) from GDP for contrastive loss $\mathcal{L}_{c}$.
  • Figure 4: Visualisation on PASCAL VOC 2012. All models are trained with 92 labeled images. The differences are highlighted in yellow boxes.
  • Figure 5: Visualisation on Cityscapes. All models are trained with 186 labeled images. The differences are highlighted in yellow boxes.
  • ...and 6 more figures