Table of Contents
Fetching ...

Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier

Prantik Howlader, Srijan Das, Hieu Le, Dimitris Samaras

TL;DR

This work addresses the limited contextual information in semi-supervised semantic segmentation by introducing a plug-in Multi-scale Patch-based Multi-label Classifier (MPMC) that provides patch-level supervision across multiple scales. MPMC also derives adaptive per-patch weights from patch confidences to mitigate the impact of noisy teacher pseudo-labels within a teacher–student framework, enabling joint training of the segmentation network and the MPMC module. Evaluations on Cityscapes, Pascal VOC, and ACDC show consistent improvements across four SSS baselines, with especially strong gains in low-label settings and across both natural and medical datasets. By enforcing explicit patch-context preservation and selective weighting, MPMC enhances discrimination between neighboring classes and supports more reliable learning from unlabeled data.

Abstract

Incorporating pixel contextual information is critical for accurate segmentation. In this paper, we show that an effective way to incorporate contextual information is through a patch-based classifier. This patch classifier is trained to identify classes present within an image region, which facilitates the elimination of distractors and enhances the classification of small object segments. Specifically, we introduce Multi-scale Patch-based Multi-label Classifier (MPMC), a novel plug-in module designed for existing semi-supervised segmentation (SSS) frameworks. MPMC offers patch-level supervision, enabling the discrimination of pixel regions of different classes within a patch. Furthermore, MPMC learns an adaptive pseudo-label weight, using patch-level classification to alleviate the impact of the teacher's noisy pseudo-label supervision the student. This lightweight module can be integrated into any SSS framework, significantly enhancing their performance. We demonstrate the efficacy of our proposed MPMC by integrating it into four SSS methodologies and improving them across two natural image and one medical segmentation dataset, notably improving the segmentation results of the baselines across all the three datasets.

Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier

TL;DR

This work addresses the limited contextual information in semi-supervised semantic segmentation by introducing a plug-in Multi-scale Patch-based Multi-label Classifier (MPMC) that provides patch-level supervision across multiple scales. MPMC also derives adaptive per-patch weights from patch confidences to mitigate the impact of noisy teacher pseudo-labels within a teacher–student framework, enabling joint training of the segmentation network and the MPMC module. Evaluations on Cityscapes, Pascal VOC, and ACDC show consistent improvements across four SSS baselines, with especially strong gains in low-label settings and across both natural and medical datasets. By enforcing explicit patch-context preservation and selective weighting, MPMC enhances discrimination between neighboring classes and supports more reliable learning from unlabeled data.

Abstract

Incorporating pixel contextual information is critical for accurate segmentation. In this paper, we show that an effective way to incorporate contextual information is through a patch-based classifier. This patch classifier is trained to identify classes present within an image region, which facilitates the elimination of distractors and enhances the classification of small object segments. Specifically, we introduce Multi-scale Patch-based Multi-label Classifier (MPMC), a novel plug-in module designed for existing semi-supervised segmentation (SSS) frameworks. MPMC offers patch-level supervision, enabling the discrimination of pixel regions of different classes within a patch. Furthermore, MPMC learns an adaptive pseudo-label weight, using patch-level classification to alleviate the impact of the teacher's noisy pseudo-label supervision the student. This lightweight module can be integrated into any SSS framework, significantly enhancing their performance. We demonstrate the efficacy of our proposed MPMC by integrating it into four SSS methodologies and improving them across two natural image and one medical segmentation dataset, notably improving the segmentation results of the baselines across all the three datasets.
Paper Structure (13 sections, 8 equations, 14 figures, 11 tables)

This paper contains 13 sections, 8 equations, 14 figures, 11 tables.

Figures (14)

  • Figure 1: Segmentations and Multi-label classifications of image patches (Cityscapes) After 20 epochs, (b) and (c) are the segmentations and classes in the segmentations respectively, (d) is classes predicted by multi-label classifier in MPMC. (f) and (g) are segmentations by Baseline (AugSeg zhao2023augmentation) and Baseline + MPMC at the end of training. White boxes represent regions where MPMC improves the baseline, while orange boxes represent regions where segmentation network predicts classes not present in the patch.
  • Figure 1: Label-wise energy scores for true positive and false negative for all classes in the Cityscapes validation set.
  • Figure 2: Overall Pipeline of our novel multi-label classification based semantic segmentation: (a) Left: End-to-end Teacher-Student Pipeline with our novel method Multi-scale Patch-based Multi-label Classifier (MPMC). (b) Right: For unlabeled images, The teacher MPMC extracts features from a layer in the segmentation model's encoder to classify the feature's receptive field patch. The confidence of MPMC for a class in a patch is used to calculate two adaptive weights $\lambda_s$ and $\lambda_m$ which is used to reduce the influence of noisy predictions in that patch from the teacher to train both the student segmentation network and MPMC.
  • Figure 2: Qualitative Results on ACDC dataset: (a) input image, (b) ground truth, (c) segmentations generated by UniMatch yang2023revisiting compared to (d) which are segmentations generated when our method (MPMC) is integrated to UniMatch. The white boxes show the areas where our method improves the baseline yang2023revisiting.
  • Figure 3: Qualitative Results on Cityscapes dataset: (a) is the original image, (b) is the Ground Truth, (c) are segmentations generated by AugSeg zhao2023augmentation compared to (d) which are segmentations generated when our method (MPMC) is integrated to AugSeg. White boxes show the areas where our method improves the baseline.
  • ...and 9 more figures