Integrating Semi-Supervised and Active Learning for Semantic Segmentation
Wanli Ma, Oktay Karakus, Paul L. Rosin
TL;DR
The paper tackles the high annotation cost of semantic segmentation by integrating semi-supervised learning with active learning. It introduces a Teacher-Student-Friend (TSF) framework augmented with a Pseudo-label Auto-refinement (PLAR) module, including an Error Mask Decoder (EMD) and a feature-space–based refinement strategy using Euclidean and Mahalanobis distances. The approach leverages both labelled and unlabelled data more efficiently, achieving substantial performance improvements on CityScapes and ISPRS Vaihingen with significantly reduced labeling budgets. Ablation studies demonstrate the distinct contributions of EMD and PLAR, and empirical results show competitive or superior performance relative to state-of-the-art SSL and AL methods on the two benchmarks.
Abstract
In this paper, we propose a novel active learning approach integrated with an improved semi-supervised learning framework to reduce the cost of manual annotation and enhance model performance. Our proposed approach effectively leverages both the labelled data selected through active learning and the unlabelled data excluded from the selection process. The proposed active learning approach pinpoints areas where the pseudo-labels are likely to be inaccurate. Then, an automatic and efficient pseudo-label auto-refinement (PLAR) module is proposed to correct pixels with potentially erroneous pseudo-labels by comparing their feature representations with those of labelled regions. This approach operates without increasing the labelling budget and is based on the cluster assumption, which states that pixels belonging to the same class should exhibit similar representations in feature space. Furthermore, manual labelling is only applied to the most difficult and uncertain areas in unlabelled data, where insufficient information prevents the PLAR module from making a decision. We evaluated the proposed hybrid semi-supervised active learning framework on two benchmark datasets, one from natural and the other from remote sensing imagery domains. In both cases, it outperformed state-of-the-art methods in the semantic segmentation task.
