Table of Contents
Fetching ...

AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation

Haonan Wang, Qixiang Zhang, Yi Li, Xiaomeng Li

TL;DR

The AllSpark shed new light on the architecture level designs of SSSS rather than framework level, which avoids increasingly complicated training pipeline designs and outperforms existing methods across all evaluation protocols on Pas-cal, Cityscapes and COCO benchmarks without bells-and-whistles.

Abstract

Semi-supervised semantic segmentation (SSSS) has been proposed to alleviate the burden of time-consuming pixel-level manual labeling, which leverages limited labeled data along with larger amounts of unlabeled data. Current state-of-the-art methods train the labeled data with ground truths and unlabeled data with pseudo labels. However, the two training flows are separate, which allows labeled data to dominate the training process, resulting in low-quality pseudo labels and, consequently, sub-optimal results. To alleviate this issue, we present AllSpark, which reborns the labeled features from unlabeled ones with the channel-wise cross-attention mechanism. We further introduce a Semantic Memory along with a Channel Semantic Grouping strategy to ensure that unlabeled features adequately represent labeled features. The AllSpark shed new light on the architecture level designs of SSSS rather than framework level, which avoids increasingly complicated training pipeline designs. It can also be regarded as a flexible bottleneck module that can be seamlessly integrated into a general transformer-based segmentation model. The proposed AllSpark outperforms existing methods across all evaluation protocols on Pascal, Cityscapes and COCO benchmarks without bells-and-whistles. Code and model weights are available at: https://github.com/xmed-lab/AllSpark.

AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation

TL;DR

The AllSpark shed new light on the architecture level designs of SSSS rather than framework level, which avoids increasingly complicated training pipeline designs and outperforms existing methods across all evaluation protocols on Pas-cal, Cityscapes and COCO benchmarks without bells-and-whistles.

Abstract

Semi-supervised semantic segmentation (SSSS) has been proposed to alleviate the burden of time-consuming pixel-level manual labeling, which leverages limited labeled data along with larger amounts of unlabeled data. Current state-of-the-art methods train the labeled data with ground truths and unlabeled data with pseudo labels. However, the two training flows are separate, which allows labeled data to dominate the training process, resulting in low-quality pseudo labels and, consequently, sub-optimal results. To alleviate this issue, we present AllSpark, which reborns the labeled features from unlabeled ones with the channel-wise cross-attention mechanism. We further introduce a Semantic Memory along with a Channel Semantic Grouping strategy to ensure that unlabeled features adequately represent labeled features. The AllSpark shed new light on the architecture level designs of SSSS rather than framework level, which avoids increasingly complicated training pipeline designs. It can also be regarded as a flexible bottleneck module that can be seamlessly integrated into a general transformer-based segmentation model. The proposed AllSpark outperforms existing methods across all evaluation protocols on Pascal, Cityscapes and COCO benchmarks without bells-and-whistles. Code and model weights are available at: https://github.com/xmed-lab/AllSpark.
Paper Structure (15 sections, 3 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 15 sections, 3 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a)(b) Comparison between the training data flows of previous methods and ours. Previous methods separate the labeled and unlabeled data training flows thus leading to dominance of the labeled data. (c) Dominance of labeled data issue of the previous method (e.g., UniMatch UniMatch). The red margin denotes how the labeled data overwhelm the unlabeled; Larger gray margin indicates the model may over-fit to labeled data.
  • Figure 2: Illustration of the core idea of AllSpark, which leverages the unlabeled features to reborn the labeled ones. The regenerated labeled features exhibit a high level of precision, yet they also possess diversity compared to the original features.
  • Figure 3: Illustration of the proposed AllSpark, which can be regarded as a flexible bottleneck plugged in the middle of a general segmentation model. In the training stage, the unlabeled features are replaced by the Semantic Memory (§ \ref{['sec:memory_bank']} & Figure \ref{['fig:memory_bank']} bottom). Moreover, the probability maps is used for Channel Semantic Grouping strategy (§ \ref{['sec:grouping']} & Figure \ref{['fig:memory_bank']} top). In the inference stage, the cross-attention is degraded to self-attention with the inputs as the hidden features of the test images.
  • Figure 4: Illustration of the Class-wise Semantic Memory Bank (§\ref{['sec:memory_bank']}) and the Channel-wise Semantic Grouping (§\ref{['sec:grouping']}). $Sim_{i,j}$ denotes the similarity between $i^{th}$ channel of the unlabeled hidden feature and the $j^{th}$ probability map. The dash lines give some visual examples of some channels. Take the column with red box as an example, $h_0$ has the largest similarity with $p_1$ ($Sim_{0,1}$), so it should be added to Class 1 slot of the Semantic Memory.
  • Figure 5: Visualization of labeled feature channels before and after the AllSpark with the same indexes. The features before AllSpark focus on more similar regions, while those after AllSpark focus on different objects or context.
  • ...and 5 more figures