Table of Contents
Fetching ...

Inconsistency Masks: Harnessing Model Disagreement for Stable Semi-Supervised Segmentation

Michael R. H. Vorndran, Bernhard F. Roeck

TL;DR

This work tackles training instability in semi-supervised semantic segmentation caused by confirmation bias from noisy pseudo-labels. It introduces Inconsistency Masks (IM), which treat model disagreement across a small ensemble as a direct signal of uncertainty, enabling selective masking of uncertain pixels during training. Through a generational SSL framework that decouples teacher and student training, IM consistently boosts performance across Cityscapes with both ResNet-50 and DINOv2 backbones and improves scratch-trained models on medical and underwater datasets. IM acts as a general enhancement that can be plugged into existing SSL methods, yielding stability and improved accuracy without requiring large pre-training, and it maintains robustness in resource-constrained environments. The work also provides extensive ablations, efficiency analyses, and multi-domain demonstrations, highlighting IM’s practical impact for niche domains lacking large-scale pretraining data.

Abstract

A primary challenge in semi-supervised learning (SSL) for segmentation is the confirmation bias from noisy pseudo-labels, which destabilizes training and degrades performance. We propose Inconsistency Masks (IM), a framework that reframes model disagreement not as noise to be averaged away, but as a valuable signal for identifying uncertainty. IM leverages an ensemble of teacher models to generate a mask that explicitly delineates regions where predictions diverge. By filtering these inconsistent areas from input-pseudo-label pairs, our method effectively mitigates the cycle of error propagation common in both continuous and iterative self-training paradigms. Extensive experiments on the Cityscapes benchmark demonstrate IM's effectiveness as a general enhancement framework: when paired with leading approaches like iMAS, U$^2$PL, and UniMatch, our method consistently boosts accuracy, achieving superior benchmarks across ResNet-50 and DINOv2 backbones, and even improving distilled architectures like SegKC. Furthermore, the method's robustness is confirmed in resource-constrained scenarios where pre-trained weights are unavailable. On three additional diverse datasets from medical and underwater domains trained entirely from scratch, IM significantly outperforms standard SSL baselines. Notably, the IM framework is dataset-agnostic, seamlessly handling binary, multi-class, and complex multi-label tasks by operating on discretized predictions. By prioritizing training stability, IM offers a generalizable and robust solution for semi-supervised segmentation, particularly in specialized areas lacking large-scale pre-training data. The full code is available at: https://github.com/MichaelVorndran/InconsistencyMasks

Inconsistency Masks: Harnessing Model Disagreement for Stable Semi-Supervised Segmentation

TL;DR

This work tackles training instability in semi-supervised semantic segmentation caused by confirmation bias from noisy pseudo-labels. It introduces Inconsistency Masks (IM), which treat model disagreement across a small ensemble as a direct signal of uncertainty, enabling selective masking of uncertain pixels during training. Through a generational SSL framework that decouples teacher and student training, IM consistently boosts performance across Cityscapes with both ResNet-50 and DINOv2 backbones and improves scratch-trained models on medical and underwater datasets. IM acts as a general enhancement that can be plugged into existing SSL methods, yielding stability and improved accuracy without requiring large pre-training, and it maintains robustness in resource-constrained environments. The work also provides extensive ablations, efficiency analyses, and multi-domain demonstrations, highlighting IM’s practical impact for niche domains lacking large-scale pretraining data.

Abstract

A primary challenge in semi-supervised learning (SSL) for segmentation is the confirmation bias from noisy pseudo-labels, which destabilizes training and degrades performance. We propose Inconsistency Masks (IM), a framework that reframes model disagreement not as noise to be averaged away, but as a valuable signal for identifying uncertainty. IM leverages an ensemble of teacher models to generate a mask that explicitly delineates regions where predictions diverge. By filtering these inconsistent areas from input-pseudo-label pairs, our method effectively mitigates the cycle of error propagation common in both continuous and iterative self-training paradigms. Extensive experiments on the Cityscapes benchmark demonstrate IM's effectiveness as a general enhancement framework: when paired with leading approaches like iMAS, UPL, and UniMatch, our method consistently boosts accuracy, achieving superior benchmarks across ResNet-50 and DINOv2 backbones, and even improving distilled architectures like SegKC. Furthermore, the method's robustness is confirmed in resource-constrained scenarios where pre-trained weights are unavailable. On three additional diverse datasets from medical and underwater domains trained entirely from scratch, IM significantly outperforms standard SSL baselines. Notably, the IM framework is dataset-agnostic, seamlessly handling binary, multi-class, and complex multi-label tasks by operating on discretized predictions. By prioritizing training stability, IM offers a generalizable and robust solution for semi-supervised segmentation, particularly in specialized areas lacking large-scale pre-training data. The full code is available at: https://github.com/MichaelVorndran/InconsistencyMasks
Paper Structure (74 sections, 5 equations, 17 figures, 10 tables)

This paper contains 74 sections, 5 equations, 17 figures, 10 tables.

Figures (17)

  • Figure 1: Creation of a binary Inconsistency Mask (IM) from two teacher models. (a, b) Binary predictions from Models 1 and 2. (c) Sum of the prediction masks, highlighting areas of agreement (value 2) and disagreement (value 1). (d) The resulting IM marks only the regions of disagreement. (e) The final high-confidence pseudo-label contains only pixels where both models agreed.
  • Figure 2: Visualization of morphological operations on an image from the SUIM dataset. $e$ denotes erosion and $d$ dilation kernel sizes. A value of $0$ signifies that the operation is skipped. The first row displays the input image with the IM applied (masked regions set to black). The second row visualizes the IM structure under varying morphological parameters. The third row shows the corresponding pseudo-label masks: background (black), IM (gray), fish (yellow), and misclassified reef (magenta).
  • Figure 3: The structure of the HeLa multi-label dataset. (a) The label-free brightfield image is the sole input to the model. The ground truth is decomposed into three independent masks: (b) ‘alive’ cells (blue), (c) ‘dead’ cells (magenta), and (d) ‘position’ points (white). (e) The final overlay demonstrates the multi-label nature of the task where masks overlap.
  • Figure 4: Analysis of training dynamics on the SUIM dataset. We plot validation loss (x-axis, lower is better) against method-specific internal metrics (y-axis) over 50 epochs. Color indicates training progress from early (purple) to late (yellow). The trajectories visualize the stability profiles discussed in Sec. \ref{['sec_dynamics_analysis']}: FixMatch and FPL exhibit pathological divergence (where internal training dynamics decouple from validation performance), U$^2$PL shows noisy guidance, CrossMatch improves systematically before saturating, while iMAS remains stable but largely stagnant.
  • Figure 1.1: Building blocks of the $1\times1$ U-Net.
  • ...and 12 more figures