Table of Contents
Fetching ...

Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning

Jianjian Yin, Tao Chen, Gensheng Pei, Yazhou Yao, Liqiang Nie, Xiansheng Hua

TL;DR

The paper tackles semi-supervised semantic segmentation by addressing underutilization of supervisory information in prior consistency-regularization methods. It introduces Multi-Constraint Consistency Learning (MCCL), which combines Feature Knowledge Alignment (FKA) to enforce image-augmentation based feature consistency between strongly and weakly augmented views, and Self-Adaptive Intervention (SAI) to expand decoder learning through feature perturbations. FKA comprises point-to-point alignment of features $F_s$ and $F_w$ and prototype-based intra-class compactness around class prototypes $\rho_k$, while SAI generates perturbed features $F_{mk}$ and $F_{ne}$ with associated losses to promote prediction consistency $p_w$ under broader feature variations. The training objective blends supervised loss $L_s$ on labeled data with a multi-term unsupervised loss $L_u = \alpha L_{p2p} + \omega L_{dt} + \beta (L_m + L_n)$, and experiments on Pascal VOC2012 and Cityscapes show state-of-the-art performance, validating the effectiveness of staged encoder/decoder enhancement and multi-constraint consistency. The approach is demonstrated with both CNN-based and Transformer backbones and is accompanied by code release for reproducibility and further research.

Abstract

Consistency regularization has prevailed in semi-supervised semantic segmentation and achieved promising performance. However, existing methods typically concentrate on enhancing the Image-augmentation based Prediction consistency and optimizing the segmentation network as a whole, resulting in insufficient utilization of potential supervisory information. In this paper, we propose a Multi-Constraint Consistency Learning (MCCL) approach to facilitate the staged enhancement of the encoder and decoder. Specifically, we first design a feature knowledge alignment (FKA) strategy to promote the feature consistency learning of the encoder from image-augmentation. Our FKA encourages the encoder to derive consistent features for strongly and weakly augmented views from the perspectives of point-to-point alignment and prototype-based intra-class compactness. Moreover, we propose a self-adaptive intervention (SAI) module to increase the discrepancy of aligned intermediate feature representations, promoting Feature-perturbation based Prediction consistency learning. Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder. Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance. The source code and models are made available at https://github.com/NUST-Machine-Intelligence-Laboratory/MCCL.

Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning

TL;DR

The paper tackles semi-supervised semantic segmentation by addressing underutilization of supervisory information in prior consistency-regularization methods. It introduces Multi-Constraint Consistency Learning (MCCL), which combines Feature Knowledge Alignment (FKA) to enforce image-augmentation based feature consistency between strongly and weakly augmented views, and Self-Adaptive Intervention (SAI) to expand decoder learning through feature perturbations. FKA comprises point-to-point alignment of features and and prototype-based intra-class compactness around class prototypes , while SAI generates perturbed features and with associated losses to promote prediction consistency under broader feature variations. The training objective blends supervised loss on labeled data with a multi-term unsupervised loss , and experiments on Pascal VOC2012 and Cityscapes show state-of-the-art performance, validating the effectiveness of staged encoder/decoder enhancement and multi-constraint consistency. The approach is demonstrated with both CNN-based and Transformer backbones and is accompanied by code release for reproducibility and further research.

Abstract

Consistency regularization has prevailed in semi-supervised semantic segmentation and achieved promising performance. However, existing methods typically concentrate on enhancing the Image-augmentation based Prediction consistency and optimizing the segmentation network as a whole, resulting in insufficient utilization of potential supervisory information. In this paper, we propose a Multi-Constraint Consistency Learning (MCCL) approach to facilitate the staged enhancement of the encoder and decoder. Specifically, we first design a feature knowledge alignment (FKA) strategy to promote the feature consistency learning of the encoder from image-augmentation. Our FKA encourages the encoder to derive consistent features for strongly and weakly augmented views from the perspectives of point-to-point alignment and prototype-based intra-class compactness. Moreover, we propose a self-adaptive intervention (SAI) module to increase the discrepancy of aligned intermediate feature representations, promoting Feature-perturbation based Prediction consistency learning. Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder. Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance. The source code and models are made available at https://github.com/NUST-Machine-Intelligence-Laboratory/MCCL.

Paper Structure

This paper contains 16 sections, 26 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: Explanation of motivation. (a) The comparison of consistency ratio with previous SOTA methods across different feature similarity ranges between Strongly and Weakly augmented View (SWV) features. Considering the higher similarity of SWV features leads to better prediction consistency, we propose to further enhance the consistency learning with Image-augmentation based Feature (IF) consistency to generate more similar SWV features. (b) The comparison of pixel ratio across different feature similarity ranges. Our proposed feature knowledge alignment (FKA) strategy effectively increases the ratio of pixels with higher-similarity SWV features.
  • Figure 2: Comparison with previous methods. (a) The existing methods primarily emphasize the Image-augmentation based Prediction (IP) consistency. (b) In contrast, our approach differs by focusing more on Image-augmentation based Feature (IF) consistency and Feature-perturbation based Prediction (FP) consistency. FKA encourages the encoder to perform consistency learning on the features ($F_s$ and $F_w$) of strongly and weakly augmented views ($u_s$ and $u_w$). SAI adaptively intervenes in $F_s$ to generate specific perturbed features ($F_{ne}$ and $F_{mk}$). The predictions ( $p_n$ and $p_m$) of $F_{ne}$ and $F_{mk}$ are supervised by weakly augmented view prediction $p_w$ to achieve FP and IP consistency.
  • Figure 3: The network architecture of Multi-Constraint Consistency Learning. The training process for labeled and unlabeled images is conducted simultaneously. The black area in $F_{mk}$ indicates that the area is filled with zero. Aug is the abbreviation for augmentation.
  • Figure 4: The visualization of self-adaptive feature masking (SFM). Regions that appear bluer receive less activation, while those that are redder receive higher activation. White regions in the $G_t$ represent areas where activation needs to be filtered.
  • Figure 5: Impact of Image-augmentation based Feature (IF) consistency on model performance using the original Pascal VOC2012 dataset. (a) Similarity variation curve of strongly-weakly augmented view features under 1464 labeled image setting. (b) Comparison with other state-of-the-art methods under different labeled image settings. UniMatch$^*$ denotes a variant of UniMatch with dissimilar features.
  • ...and 4 more figures