Table of Contents
Fetching ...

Semi-Supervised Semantic Segmentation via Marginal Contextual Information

Moshe Kimhi, Shai Kimhi, Evgenii Zheltonozhskii, Or Litany, Chaim Baskin

TL;DR

This work targets efficient semi-supervised semantic segmentation by addressing confirmation bias in pseudo labeling. It introduces S4MC, a teacher–student framework that employs dynamic margin-based pseudo labeling and a Marginal Contextual Information refinement to leverage neighboring pixel predictions. Across VOC, Cityscapes, and COCO, S4MC achieves state-of-the-art gains in low-label settings (e.g., +1.39 mIoU on VOC 12 with 366 labeled images) and demonstrates improved label quality alongside increased unlabeled data usage. The approach combines a novel confidence refinement module, dynamic thresholding, and thorough ablations to show robustness to neighborhood design and parameter settings, offering a practical path to reduce annotation costs in dense prediction tasks.

Abstract

We present a novel confidence refinement scheme that enhances pseudo labels in semi-supervised semantic segmentation. Unlike existing methods, which filter pixels with low-confidence predictions in isolation, our approach leverages the spatial correlation of labels in segmentation maps by grouping neighboring pixels and considering their pseudo labels collectively. With this contextual information, our method, named S4MC, increases the amount of unlabeled data used during training while maintaining the quality of the pseudo labels, all with negligible computational overhead. Through extensive experiments on standard benchmarks, we demonstrate that S4MC outperforms existing state-of-the-art semi-supervised learning approaches, offering a promising solution for reducing the cost of acquiring dense annotations. For example, S4MC achieves a 1.39 mIoU improvement over the prior art on PASCAL VOC 12 with 366 annotated images. The code to reproduce our experiments is available at https://s4mcontext.github.io/

Semi-Supervised Semantic Segmentation via Marginal Contextual Information

TL;DR

This work targets efficient semi-supervised semantic segmentation by addressing confirmation bias in pseudo labeling. It introduces S4MC, a teacher–student framework that employs dynamic margin-based pseudo labeling and a Marginal Contextual Information refinement to leverage neighboring pixel predictions. Across VOC, Cityscapes, and COCO, S4MC achieves state-of-the-art gains in low-label settings (e.g., +1.39 mIoU on VOC 12 with 366 labeled images) and demonstrates improved label quality alongside increased unlabeled data usage. The approach combines a novel confidence refinement module, dynamic thresholding, and thorough ablations to show robustness to neighborhood design and parameter settings, offering a practical path to reduce annotation costs in dense prediction tasks.

Abstract

We present a novel confidence refinement scheme that enhances pseudo labels in semi-supervised semantic segmentation. Unlike existing methods, which filter pixels with low-confidence predictions in isolation, our approach leverages the spatial correlation of labels in segmentation maps by grouping neighboring pixels and considering their pseudo labels collectively. With this contextual information, our method, named S4MC, increases the amount of unlabeled data used during training while maintaining the quality of the pseudo labels, all with negligible computational overhead. Through extensive experiments on standard benchmarks, we demonstrate that S4MC outperforms existing state-of-the-art semi-supervised learning approaches, offering a promising solution for reducing the cost of acquiring dense annotations. For example, S4MC achieves a 1.39 mIoU improvement over the prior art on PASCAL VOC 12 with 366 annotated images. The code to reproduce our experiments is available at https://s4mcontext.github.io/
Paper Structure (40 sections, 14 equations, 10 figures, 10 tables, 1 algorithm)

This paper contains 40 sections, 14 equations, 10 figures, 10 tables, 1 algorithm.

Figures (10)

  • Figure 1: Confidence refinement observation over one class (Cat). Left: pseudo labels generated without refinement. Middle: pseudo labels obtained from the same model after refinement with marginal contextual information. RightTop: predicted probabilities of the top two classes of the pixel highlighted by the red square before, and Bottom: after refinement. S4MC allows additional correct pseudo labels to propagate.
  • Figure 2: Left: S4MC employs a teacher--student paradigm for semi-supervised segmentation. Labeled images are used to supervise the student network directly; both networks process unlabeled images. Teacher predictions are refined and used to evaluate the margin value, which is then thresholded to produce pseudo labels that guide the student network. The threshold, denoted as $\gamma_t$, is dynamically adjusted based on the teacher network's predictions. Right: Our confidence refinement module exploits neighboring pixels to adjust per-class predictions, as detailed in \ref{['sec:pseudo labels']}. The class distribution of the pixel marked by the yellow circle on the left is changed. Before refinement, the margin surpasses the threshold and erroneously assigns the blue class (dog) as a pseudo label. After refinement, the margin reduces, thereby preventing error propagation.
  • Figure 3: Qualitative results.Segmentation mask from left to right: Ground Truth, CutMix-Seg and CutMix-Seg+S4MC. Heat map left is CutMix-Seg and right CutMix-Seg+S4MC, represents the uncertainty of the model ($\kappa^{-1}$), showing more confident predictions in certain areas and smoother segmentation maps (marked by the red boxes). Additional examples are shown in \ref{['sec:visual_resutls']}.
  • Figure 4: pseudo label quantity and quality on PASCAL VOC 12 voc with 366 labeled images using our margin (\ref{['eqn:kappa_margin']}) confidence function. The training was performed using S4MC; metrics with and without S4MC were calculated.
  • Figure A.1: Example of refined pseudo labels, structure of the figure as \ref{['fig:neigbors']} and the numbers under the predictions show the pixel-wise accuracy of the prediction map.
  • ...and 5 more figures