Inconsistency Masks: Harnessing Model Disagreement for Stable Semi-Supervised Segmentation
Michael R. H. Vorndran, Bernhard F. Roeck
TL;DR
This work tackles training instability in semi-supervised semantic segmentation caused by confirmation bias from noisy pseudo-labels. It introduces Inconsistency Masks (IM), which treat model disagreement across a small ensemble as a direct signal of uncertainty, enabling selective masking of uncertain pixels during training. Through a generational SSL framework that decouples teacher and student training, IM consistently boosts performance across Cityscapes with both ResNet-50 and DINOv2 backbones and improves scratch-trained models on medical and underwater datasets. IM acts as a general enhancement that can be plugged into existing SSL methods, yielding stability and improved accuracy without requiring large pre-training, and it maintains robustness in resource-constrained environments. The work also provides extensive ablations, efficiency analyses, and multi-domain demonstrations, highlighting IM’s practical impact for niche domains lacking large-scale pretraining data.
Abstract
A primary challenge in semi-supervised learning (SSL) for segmentation is the confirmation bias from noisy pseudo-labels, which destabilizes training and degrades performance. We propose Inconsistency Masks (IM), a framework that reframes model disagreement not as noise to be averaged away, but as a valuable signal for identifying uncertainty. IM leverages an ensemble of teacher models to generate a mask that explicitly delineates regions where predictions diverge. By filtering these inconsistent areas from input-pseudo-label pairs, our method effectively mitigates the cycle of error propagation common in both continuous and iterative self-training paradigms. Extensive experiments on the Cityscapes benchmark demonstrate IM's effectiveness as a general enhancement framework: when paired with leading approaches like iMAS, U$^2$PL, and UniMatch, our method consistently boosts accuracy, achieving superior benchmarks across ResNet-50 and DINOv2 backbones, and even improving distilled architectures like SegKC. Furthermore, the method's robustness is confirmed in resource-constrained scenarios where pre-trained weights are unavailable. On three additional diverse datasets from medical and underwater domains trained entirely from scratch, IM significantly outperforms standard SSL baselines. Notably, the IM framework is dataset-agnostic, seamlessly handling binary, multi-class, and complex multi-label tasks by operating on discretized predictions. By prioritizing training stability, IM offers a generalizable and robust solution for semi-supervised segmentation, particularly in specialized areas lacking large-scale pre-training data. The full code is available at: https://github.com/MichaelVorndran/InconsistencyMasks
