Table of Contents
Fetching ...

Boundary-Refined Prototype Generation: A General End-to-End Paradigm for Semi-Supervised Semantic Segmentation

Junhao Dong, Zhu Meng, Delong Liu, Jiaxuan Liu, Zhicheng Zhao, Fei Su

TL;DR

This work addresses the limitations of non-end-to-end prototype-based semi-supervised semantic segmentation by introducing Boundary-Refined Prototype Generation (BRPG), which integrates online clustering, confidence-aware prototype sampling, and adaptive prototype augmentation into a mean-teacher training framework. BRPG comprises two main components: Confidence-Based Prototype Generation (CPG) that creates high- and low-confidence prototypes to better capture class boundaries, and Adaptive Prototype Optimization (APO) that increases prototype counts for dispersed classes, all updated online during training. PROTOTYPE-based contrastive learning then aligns pixel features with these prototypes, producing improved intraclass compactness and clearer decision boundaries, with losses L_s, L_u, and L_pro guiding the optimization: $L= L_s + \lambda_u L_u + \lambda_{pro} L_{pro}$ and prototype updates $r_{c,k}^{(t)}=\alpha r_{c,k}^{(t-1)}+(1-\alpha) \bar{f}_{c,k}$. The approach demonstrates strong, scalable performance across PASCAL VOC 2012, Cityscapes, and MS COCO, achieving state-of-the-art results under multiple supervision levels and network architectures, and showing robust applicability to architectures like DeepLabV3+ and SegFormer. This suggests BRPG’s practical impact for efficient learning from unlabeled data in diverse, real-world segmentation tasks.

Abstract

Semi-supervised semantic segmentation has attracted increasing attention in computer vision, aiming to leverage unlabeled data through latent supervision. To achieve this goal, prototype-based classification has been introduced and achieved lots of success. However, the current approaches isolate prototype generation from the main training framework, presenting a non-end-to-end workflow. Furthermore, most methods directly perform the K-Means clustering on features to generate prototypes, resulting in their proximity to category semantic centers, while overlooking the clear delineation of class boundaries. To address the above problems, we propose a novel end-to-end boundary-refined prototype generation (BRPG) method. Specifically, we perform online clustering on sampled features to incorporate the prototype generation into the whole training framework. In addition, to enhance the classification boundaries, we sample and cluster high- and low-confidence features separately based on confidence estimation, facilitating the generation of prototypes closer to the class boundaries. Moreover, an adaptive prototype optimization strategy is proposed to increase the number of prototypes for categories with scattered feature distributions, which further refines the class boundaries. Extensive experiments demonstrate the remarkable robustness and scalability of our method across diverse datasets, segmentation networks, and semi-supervised frameworks, outperforming the state-of-the-art approaches on three benchmark datasets: PASCAL VOC 2012, Cityscapes and MS COCO. The code is available at https://github.com/djh-dzxw/BRPG.

Boundary-Refined Prototype Generation: A General End-to-End Paradigm for Semi-Supervised Semantic Segmentation

TL;DR

This work addresses the limitations of non-end-to-end prototype-based semi-supervised semantic segmentation by introducing Boundary-Refined Prototype Generation (BRPG), which integrates online clustering, confidence-aware prototype sampling, and adaptive prototype augmentation into a mean-teacher training framework. BRPG comprises two main components: Confidence-Based Prototype Generation (CPG) that creates high- and low-confidence prototypes to better capture class boundaries, and Adaptive Prototype Optimization (APO) that increases prototype counts for dispersed classes, all updated online during training. PROTOTYPE-based contrastive learning then aligns pixel features with these prototypes, producing improved intraclass compactness and clearer decision boundaries, with losses L_s, L_u, and L_pro guiding the optimization: and prototype updates . The approach demonstrates strong, scalable performance across PASCAL VOC 2012, Cityscapes, and MS COCO, achieving state-of-the-art results under multiple supervision levels and network architectures, and showing robust applicability to architectures like DeepLabV3+ and SegFormer. This suggests BRPG’s practical impact for efficient learning from unlabeled data in diverse, real-world segmentation tasks.

Abstract

Semi-supervised semantic segmentation has attracted increasing attention in computer vision, aiming to leverage unlabeled data through latent supervision. To achieve this goal, prototype-based classification has been introduced and achieved lots of success. However, the current approaches isolate prototype generation from the main training framework, presenting a non-end-to-end workflow. Furthermore, most methods directly perform the K-Means clustering on features to generate prototypes, resulting in their proximity to category semantic centers, while overlooking the clear delineation of class boundaries. To address the above problems, we propose a novel end-to-end boundary-refined prototype generation (BRPG) method. Specifically, we perform online clustering on sampled features to incorporate the prototype generation into the whole training framework. In addition, to enhance the classification boundaries, we sample and cluster high- and low-confidence features separately based on confidence estimation, facilitating the generation of prototypes closer to the class boundaries. Moreover, an adaptive prototype optimization strategy is proposed to increase the number of prototypes for categories with scattered feature distributions, which further refines the class boundaries. Extensive experiments demonstrate the remarkable robustness and scalability of our method across diverse datasets, segmentation networks, and semi-supervised frameworks, outperforming the state-of-the-art approaches on three benchmark datasets: PASCAL VOC 2012, Cityscapes and MS COCO. The code is available at https://github.com/djh-dzxw/BRPG.
Paper Structure (28 sections, 23 equations, 13 figures, 10 tables)

This paper contains 28 sections, 23 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Visualization of the feature embeddings sampled by the two manners. (a) Random sampling and clustering. (b) Separate sampling and clustering based on a confidence threshold set as 0.8. The symbol $"\text{$\boldsymbol{\times}$}"$ represents the generated prototypes in (a) and the high-confidence prototypes in (b), while $"\triangledown"$ denotes the low-confidence prototypes, typically closer to the classification boundaries.
  • Figure 2: Mean cosine similarities between high- and low-confidence features and the class centers on PASCAL VOC 2012. Lower values indicate that features tend to deviate further from the class centers.
  • Figure 3: Concise flowcharts of (a) $\text{U}^2\text{PL}$u2pl, (b) PCR pcr, and (c) Ours. Student: Student model's encoder. Teacher: Teacher model's encoder. C: Classification head. F: Feature head. Proto.: Class prototypes. R.S.: Random sampling.
  • Figure 4: An overview of our training framework. (a) presents the pretraining stage of the mean teacher model. (b) shows the proposed boundary-refined prototype generation (BRPG) method with confidence-based prototype generation (CPG) (gray region) and adaptive prototype optimization (APO) (blue region). "R.S." denotes the random sampling. (c) illustrates the overall training process with the generated prototypes.
  • Figure 5: An overview of our approach
  • ...and 8 more figures