Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation
Prantik Howlader, Hieu Le, Dimitris Samaras
TL;DR
This work tackles the problem of unreliable pseudo-labels in semi-supervised semantic segmentation by introducing a two-step approach: first, identify reliable pseudo-label pixels through an ensemble of a segmentation model and a trained object detector, and second, assign per-pixel learning weights derived from rank statistics against class prototypes built from labeled and reliable pseudo-label pixels. The per-pixel weight is defined as $W^{PL}_i = s/k$ with $k=5$, where $s$ counts shared top-k activation indices between the pixel feature and its class prototype, and it modulates the unsupervised loss together with the standard supervised loss $L = L_s + α L_u$. The class prototypes are stored in a memory bank and are built from labeled data and reliable pseudo-labels, enabling robust, noise-tolerant weighting, especially in the early training stages. The method is designed to be drop-in compatible with four strong SSL frameworks (AugSeg, AEL, U2PL, UniMatch) and yields consistent improvements on Cityscapes and Pascal VOC, as well as demonstrating adaptability to Transformer-based models and MS COCO. Collectively, these contributions provide a practical, scalable path to more reliable semi-supervised segmentation in low-label regimes and across diverse architectures and datasets.
Abstract
Semi-supervised semantic segmentation methods leverage unlabeled data by pseudo-labeling them. Thus the success of these methods hinges on the reliablility of the pseudo-labels. Existing methods mostly choose high-confidence pixels in an effort to avoid erroneous pseudo-labels. However, high confidence does not guarantee correct pseudo-labels especially in the initial training iterations. In this paper, we propose a novel approach to reliably learn from pseudo-labels. First, we unify the predictions from a trained object detector and a semantic segmentation model to identify reliable pseudo-label pixels. Second, we assign different learning weights to pseudo-labeled pixels to avoid noisy training signals. To determine these weights, we first use the reliable pseudo-label pixels identified from the first step and labeled pixels to construct a prototype for each class. Then, the per-pixel weight is the structural similarity between the pixel and the prototype measured via rank-statistics similarity. This metric is robust to noise, making it better suited for comparing features from unlabeled images, particularly in the initial training phases where wrong pseudo labels are prone to occur. We show that our method can be easily integrated into four semi-supervised semantic segmentation frameworks, and improves them in both Cityscapes and Pascal VOC datasets.
