Exploiting Minority Pseudo-Labels for Semi-Supervised Fine-grained Road Scene Understanding
Yuting Hong, Yongkang Wu, Hui Xiao, Huazheng Hao, Xiaojie Qiu, Baochen Yao, Chengbin Peng
TL;DR
The paper tackles the problem of semi-supervised semantic segmentation for fine-grained road scenes plagued by long-tailed class distributions that underrepresent minority classes. It introduces STPG, a synergistic framework that combines a professional module focused on minority pseudo-labels with a general module that leverages all pseudo-labels, augmented by anchor-based contrastive learning to evenly distribute class representations. A mismatch-score driven minority pseudo-label selection further enhances learning of hard classes, while cross-guided decoupling reduces model coupling between modules. Empirically, STPG yields strong gains on Cityscapes, CamVid, and PASCAL VOC 2012, with substantial improvements for tail classes and robust performance under limited labeled data, indicating practical impact for robust autonomous driving perception.
Abstract
In fine-grained road scene understanding, semantic segmentation plays a crucial role in enabling vehicles to perceive and comprehend their surroundings. By assigning a specific class label to each pixel in an image, it allows for precise identification and localization of detailed road features, which is vital for high-quality scene understanding and downstream perception tasks. A key challenge in this domain lies in improving the recognition performance of minority classes while mitigating the dominance of majority classes, which is essential for achieving balanced and robust overall performance. However, traditional semi-supervised learning methods often train models overlooking the imbalance between classes. To address this issue, firstly, we propose a general training module that learns from all the pseudo-labels without a conventional filtering strategy. Secondly, we propose a professional training module to learn specifically from reliable minority-class pseudo-labels identified by a novel mismatch score metric. The two modules are crossly supervised by each other so that it reduces model coupling which is essential for semi-supervised learning. During contrastive learning, to avoid the dominance of the majority classes in the feature space, we propose a strategy to assign evenly distributed anchors for different classes in the feature space. Experimental results on multiple public benchmarks show that our method surpasses traditional approaches in recognizing tail classes.
