Table of Contents
Fetching ...

Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation

Prantik Howlader, Hieu Le, Dimitris Samaras

TL;DR

This work tackles the problem of unreliable pseudo-labels in semi-supervised semantic segmentation by introducing a two-step approach: first, identify reliable pseudo-label pixels through an ensemble of a segmentation model and a trained object detector, and second, assign per-pixel learning weights derived from rank statistics against class prototypes built from labeled and reliable pseudo-label pixels. The per-pixel weight is defined as $W^{PL}_i = s/k$ with $k=5$, where $s$ counts shared top-k activation indices between the pixel feature and its class prototype, and it modulates the unsupervised loss together with the standard supervised loss $L = L_s + α L_u$. The class prototypes are stored in a memory bank and are built from labeled data and reliable pseudo-labels, enabling robust, noise-tolerant weighting, especially in the early training stages. The method is designed to be drop-in compatible with four strong SSL frameworks (AugSeg, AEL, U2PL, UniMatch) and yields consistent improvements on Cityscapes and Pascal VOC, as well as demonstrating adaptability to Transformer-based models and MS COCO. Collectively, these contributions provide a practical, scalable path to more reliable semi-supervised segmentation in low-label regimes and across diverse architectures and datasets.

Abstract

Semi-supervised semantic segmentation methods leverage unlabeled data by pseudo-labeling them. Thus the success of these methods hinges on the reliablility of the pseudo-labels. Existing methods mostly choose high-confidence pixels in an effort to avoid erroneous pseudo-labels. However, high confidence does not guarantee correct pseudo-labels especially in the initial training iterations. In this paper, we propose a novel approach to reliably learn from pseudo-labels. First, we unify the predictions from a trained object detector and a semantic segmentation model to identify reliable pseudo-label pixels. Second, we assign different learning weights to pseudo-labeled pixels to avoid noisy training signals. To determine these weights, we first use the reliable pseudo-label pixels identified from the first step and labeled pixels to construct a prototype for each class. Then, the per-pixel weight is the structural similarity between the pixel and the prototype measured via rank-statistics similarity. This metric is robust to noise, making it better suited for comparing features from unlabeled images, particularly in the initial training phases where wrong pseudo labels are prone to occur. We show that our method can be easily integrated into four semi-supervised semantic segmentation frameworks, and improves them in both Cityscapes and Pascal VOC datasets.

Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation

TL;DR

This work tackles the problem of unreliable pseudo-labels in semi-supervised semantic segmentation by introducing a two-step approach: first, identify reliable pseudo-label pixels through an ensemble of a segmentation model and a trained object detector, and second, assign per-pixel learning weights derived from rank statistics against class prototypes built from labeled and reliable pseudo-label pixels. The per-pixel weight is defined as with , where counts shared top-k activation indices between the pixel feature and its class prototype, and it modulates the unsupervised loss together with the standard supervised loss . The class prototypes are stored in a memory bank and are built from labeled data and reliable pseudo-labels, enabling robust, noise-tolerant weighting, especially in the early training stages. The method is designed to be drop-in compatible with four strong SSL frameworks (AugSeg, AEL, U2PL, UniMatch) and yields consistent improvements on Cityscapes and Pascal VOC, as well as demonstrating adaptability to Transformer-based models and MS COCO. Collectively, these contributions provide a practical, scalable path to more reliable semi-supervised segmentation in low-label regimes and across diverse architectures and datasets.

Abstract

Semi-supervised semantic segmentation methods leverage unlabeled data by pseudo-labeling them. Thus the success of these methods hinges on the reliablility of the pseudo-labels. Existing methods mostly choose high-confidence pixels in an effort to avoid erroneous pseudo-labels. However, high confidence does not guarantee correct pseudo-labels especially in the initial training iterations. In this paper, we propose a novel approach to reliably learn from pseudo-labels. First, we unify the predictions from a trained object detector and a semantic segmentation model to identify reliable pseudo-label pixels. Second, we assign different learning weights to pseudo-labeled pixels to avoid noisy training signals. To determine these weights, we first use the reliable pseudo-label pixels identified from the first step and labeled pixels to construct a prototype for each class. Then, the per-pixel weight is the structural similarity between the pixel and the prototype measured via rank-statistics similarity. This metric is robust to noise, making it better suited for comparing features from unlabeled images, particularly in the initial training phases where wrong pseudo labels are prone to occur. We show that our method can be easily integrated into four semi-supervised semantic segmentation frameworks, and improves them in both Cityscapes and Pascal VOC datasets.
Paper Structure (24 sections, 4 equations, 17 figures, 9 tables)

This paper contains 24 sections, 4 equations, 17 figures, 9 tables.

Figures (17)

  • Figure 1: Per-pixel Learning Weight Visualization (heat-map). Our Per-pixel Learning Weight shows that the weight on unreliable high-confidence pseudo-labels (dotted white box) is reduced in contrast to conventional confidence thresholding ($\ge 0.95$). Pseudo-labels are generated using AugSeg zhao2023augmentation after 50 epochs for $\frac{1}{16}$ Pascal VOC Dataset.
  • Figure 1: Pseudo-labeling accuracy in Pascal VOC unlabeled images
  • Figure 2: Overall Pipeline of our novel pseudo-labeling based semantic segmentation: (a) End-to-end Teacher-Student Pipeline (b) We first identify pixels with reliable pseudo-labels using an object detector and segmentation model. The reliable pseudo-label pixels are defined as ones being labeled as the same class by both the detection and segmentation models with high confidence scores. (c) We constructed a pixel-representation prototype for each class using labeled images and identified reliable pseudo-label pixels. We then use rank statistics han2020automatically to weight the pseudo-labels predicted by the teacher network.
  • Figure 2: Analysis of top-rank indices of class prototypes
  • Figure 3: Demonstration of Pseudo-label Pixel Weighting via rank-statistics: This diagram shows the top-$2$ ranking based pseudo-pixel weighing for two pixels $x^u_i$ and $x^u_j$ in unlabeled mage $x^u$. PL class is Pseudo-label class, GT class is Ground Truth class. Note, top-2 ranking is same between $x^u_i$ and bus pixel prototype, while different between $x^u_j$ and car pixel prototype.
  • ...and 12 more figures