Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning
Jianjian Yin, Tao Chen, Gensheng Pei, Yazhou Yao, Liqiang Nie, Xiansheng Hua
TL;DR
The paper tackles semi-supervised semantic segmentation by addressing underutilization of supervisory information in prior consistency-regularization methods. It introduces Multi-Constraint Consistency Learning (MCCL), which combines Feature Knowledge Alignment (FKA) to enforce image-augmentation based feature consistency between strongly and weakly augmented views, and Self-Adaptive Intervention (SAI) to expand decoder learning through feature perturbations. FKA comprises point-to-point alignment of features $F_s$ and $F_w$ and prototype-based intra-class compactness around class prototypes $\rho_k$, while SAI generates perturbed features $F_{mk}$ and $F_{ne}$ with associated losses to promote prediction consistency $p_w$ under broader feature variations. The training objective blends supervised loss $L_s$ on labeled data with a multi-term unsupervised loss $L_u = \alpha L_{p2p} + \omega L_{dt} + \beta (L_m + L_n)$, and experiments on Pascal VOC2012 and Cityscapes show state-of-the-art performance, validating the effectiveness of staged encoder/decoder enhancement and multi-constraint consistency. The approach is demonstrated with both CNN-based and Transformer backbones and is accompanied by code release for reproducibility and further research.
Abstract
Consistency regularization has prevailed in semi-supervised semantic segmentation and achieved promising performance. However, existing methods typically concentrate on enhancing the Image-augmentation based Prediction consistency and optimizing the segmentation network as a whole, resulting in insufficient utilization of potential supervisory information. In this paper, we propose a Multi-Constraint Consistency Learning (MCCL) approach to facilitate the staged enhancement of the encoder and decoder. Specifically, we first design a feature knowledge alignment (FKA) strategy to promote the feature consistency learning of the encoder from image-augmentation. Our FKA encourages the encoder to derive consistent features for strongly and weakly augmented views from the perspectives of point-to-point alignment and prototype-based intra-class compactness. Moreover, we propose a self-adaptive intervention (SAI) module to increase the discrepancy of aligned intermediate feature representations, promoting Feature-perturbation based Prediction consistency learning. Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder. Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance. The source code and models are made available at https://github.com/NUST-Machine-Intelligence-Laboratory/MCCL.
