Table of Contents
Fetching ...

Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation

Haochen Wang, Yuchao Wang, Yujun Shen, Junsong Fan, Yuxi Wang, Zhaoxiang Zhang

TL;DR

This work separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue that consists of negative keys, and manage to train the model with all candidate pixels through an effective pipeline to make sufficient use of unlabeled data.

Abstract

The crux of label-efficient semantic segmentation is to produce high-quality pseudo-labels to leverage a large amount of unlabeled or weakly labeled data. A common practice is to select the highly confident predictions as the pseudo-ground-truths for each pixel, but it leads to a problem that most pixels may be left unused due to their unreliability. However, we argue that every pixel matters to the model training, even those unreliable and ambiguous pixels. Intuitively, an unreliable prediction may get confused among the top classes, however, it should be confident about the pixel not belonging to the remaining classes. Hence, such a pixel can be convincingly treated as a negative key to those most unlikely categories. Therefore, we develop an effective pipeline to make sufficient use of unlabeled data. Concretely, we separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue that consists of negative keys, and manage to train the model with all candidate pixels. Considering the training evolution, we adaptively adjust the threshold for the reliable-unreliable partition. Experimental results on various benchmarks and training settings demonstrate the superiority of our approach over the state-of-the-art alternatives.

Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation

TL;DR

This work separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue that consists of negative keys, and manage to train the model with all candidate pixels through an effective pipeline to make sufficient use of unlabeled data.

Abstract

The crux of label-efficient semantic segmentation is to produce high-quality pseudo-labels to leverage a large amount of unlabeled or weakly labeled data. A common practice is to select the highly confident predictions as the pseudo-ground-truths for each pixel, but it leads to a problem that most pixels may be left unused due to their unreliability. However, we argue that every pixel matters to the model training, even those unreliable and ambiguous pixels. Intuitively, an unreliable prediction may get confused among the top classes, however, it should be confident about the pixel not belonging to the remaining classes. Hence, such a pixel can be convincingly treated as a negative key to those most unlikely categories. Therefore, we develop an effective pipeline to make sufficient use of unlabeled data. Concretely, we separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue that consists of negative keys, and manage to train the model with all candidate pixels. Considering the training evolution, we adaptively adjust the threshold for the reliable-unreliable partition. Experimental results on various benchmarks and training settings demonstrate the superiority of our approach over the state-of-the-art alternatives.
Paper Structure (72 sections, 31 equations, 7 figures, 17 tables, 1 algorithm)

This paper contains 72 sections, 31 equations, 7 figures, 17 tables, 1 algorithm.

Figures (7)

  • Figure 1: Category-wise performance and statistics on the number of pixels with reliable and unreliable predictions. Model is trained using $732$ labeled images on PASCAL VOC 2012 voc and evaluated on the remaining $9,850$ images.
  • Figure 2: Illustration on unreliable pseudo-labels. (a) Pixel-wise entropy predicted from an unlabeled image. (b) Pixel-wise pseudo-labels from reliable predictions only, where pixels within the white region are not assigned a pseudo-label. (c) Category-wise probability of a reliable prediction (i.e., the yellow cross). (d) Category-wise probability of an unreliable prediction (i.e., the white cross), which hovers between motorbike and person, yet is confident enough of not belonging to car and train.
  • Figure 3: Illustration of U$^{\text{2}}$PL+. Segmentation predictions are first split into reliable ones and unreliable ones based on their pixel-level entropy. The reliable predictions are used to be the pseudo-labels and to compute category-wise prototypes. Each unreliable prediction is pushed into a category-wise memory bank and regarded as negative keys for its unlikely classes. Pixels in each memory bank are regarded as the negative samples to the corresponding class, which is formulated as Eq. (\ref{['eq:contraloss']}).
  • Figure 4: Qualitative results on PASCAL VOC 2012val set. All models are trained under the $1/4$ partition protocol of blender set, which contains $2,646$ labeled images and $7,396$ unlabeled images. (a) Input images. (b) Labels for the corresponding image. (c) Only labeled images are used for training without any unlabeled data. (d) Predictions from our conference version U$^\text{2}$PL. (e) Predictions from U$^{\text{2}}$PL+.
  • Figure 5: Qualitative results on Cityscapesval set. All models are trained under the $1/2$ partition protocol, which contains $1,488$ labeled images and $1,487$ unlabeled images. (a) Input images. (b) Hand-annotated labels for the corresponding image. (c) Only labeled images are used for training. (d) Predictions from our conference version, i.e., U$^\text{2}$PL wang2022semi. (e) Predictions from U$^{\text{2}}$PL+. Yellow rectangles highlight the promotion by adequately using unreliable pseudo-labels.
  • ...and 2 more figures