Table of Contents
Fetching ...

BoundMatch: Boundary detection applied to semi-supervised segmentation

Haruya Ishikawa, Yoshimitsu Aoki

TL;DR

BoundMatch tackles the boundary-annotation bottleneck in semi-supervised semantic segmentation by introducing Boundary Consistency Regularized Multi-Task Learning (BCRM), where a boundary head learns semantic boundaries independently from segmentation outputs. Two lightweight fusion modules, Boundary-Semantic Fusion (BSF) and Spatial Gradient Fusion (SGF), enable bidirectional information exchange between boundary and segmentation tasks, while a Harmonious BN (HBN) baseline (SAMTH) stabilizes teacher–student training. Empirical results across Cityscapes, BDD100K, SYNTHIA, ADE20K, and Pascal VOC show consistent gains in mIoU and boundary metrics (BIoU, BF1), with state-of-the-art performance on DINOv2–based Cityscapes and good transfer to lightweight architectures. The work demonstrates that explicit, independently learned boundary supervision and simple fusion strategies can meaningfully improve boundary delineation and overall segmentation in semi-supervised settings, offering practical benefits for autonomous driving and related domains. BoundMatch remains modular and compatible with existing CR methods, suggesting wide applicability and avenues for future extension to other geometric cues and foundation-model priors.

Abstract

Semi-supervised semantic segmentation (SS-SS) aims to mitigate the heavy annotation burden of dense pixel labeling by leveraging abundant unlabeled images alongside a small labeled set. While current consistency regularization methods achieve strong results, most do not explicitly model boundaries as a separate learning objective. In this paper, we propose BoundMatch, a novel multi-task SS-SS framework that explicitly integrates semantic boundary detection into a teacher-student consistency regularization pipeline. Our core mechanism, Boundary Consistency Regularized Multi-Task Learning (BCRM), enforces prediction agreement between teacher and student models on both segmentation masks and detailed semantic boundaries, providing complementary supervision from two independent tasks. To further enhance performance and encourage sharper boundaries, BoundMatch incorporates two lightweight fusion modules: Boundary-Semantic Fusion (BSF) injects learned boundary cues into the segmentation decoder, while Spatial Gradient Fusion (SGF) refines boundary predictions using mask gradients, yielding more reliable boundary pseudo-labels. This framework is built upon SAMTH, a strong teacher-student baseline featuring a Harmonious Batch Normalization (HBN) update strategy for improved stability. Extensive experiments on diverse datasets including Cityscapes and Pascal VOC show that BoundMatch achieves competitive performance against current state-of-the-art methods. Our approach achieves state-of-the-art results on the new Cityscapes benchmark with DINOv2 foundation model. Ablation studies highlight BoundMatch's ability to improve boundary-specific evaluation metrics, its effectiveness in realistic large-scale unlabeled data scenario, and applicability to lightweight architectures for mobile deployment.

BoundMatch: Boundary detection applied to semi-supervised segmentation

TL;DR

BoundMatch tackles the boundary-annotation bottleneck in semi-supervised semantic segmentation by introducing Boundary Consistency Regularized Multi-Task Learning (BCRM), where a boundary head learns semantic boundaries independently from segmentation outputs. Two lightweight fusion modules, Boundary-Semantic Fusion (BSF) and Spatial Gradient Fusion (SGF), enable bidirectional information exchange between boundary and segmentation tasks, while a Harmonious BN (HBN) baseline (SAMTH) stabilizes teacher–student training. Empirical results across Cityscapes, BDD100K, SYNTHIA, ADE20K, and Pascal VOC show consistent gains in mIoU and boundary metrics (BIoU, BF1), with state-of-the-art performance on DINOv2–based Cityscapes and good transfer to lightweight architectures. The work demonstrates that explicit, independently learned boundary supervision and simple fusion strategies can meaningfully improve boundary delineation and overall segmentation in semi-supervised settings, offering practical benefits for autonomous driving and related domains. BoundMatch remains modular and compatible with existing CR methods, suggesting wide applicability and avenues for future extension to other geometric cues and foundation-model priors.

Abstract

Semi-supervised semantic segmentation (SS-SS) aims to mitigate the heavy annotation burden of dense pixel labeling by leveraging abundant unlabeled images alongside a small labeled set. While current consistency regularization methods achieve strong results, most do not explicitly model boundaries as a separate learning objective. In this paper, we propose BoundMatch, a novel multi-task SS-SS framework that explicitly integrates semantic boundary detection into a teacher-student consistency regularization pipeline. Our core mechanism, Boundary Consistency Regularized Multi-Task Learning (BCRM), enforces prediction agreement between teacher and student models on both segmentation masks and detailed semantic boundaries, providing complementary supervision from two independent tasks. To further enhance performance and encourage sharper boundaries, BoundMatch incorporates two lightweight fusion modules: Boundary-Semantic Fusion (BSF) injects learned boundary cues into the segmentation decoder, while Spatial Gradient Fusion (SGF) refines boundary predictions using mask gradients, yielding more reliable boundary pseudo-labels. This framework is built upon SAMTH, a strong teacher-student baseline featuring a Harmonious Batch Normalization (HBN) update strategy for improved stability. Extensive experiments on diverse datasets including Cityscapes and Pascal VOC show that BoundMatch achieves competitive performance against current state-of-the-art methods. Our approach achieves state-of-the-art results on the new Cityscapes benchmark with DINOv2 foundation model. Ablation studies highlight BoundMatch's ability to improve boundary-specific evaluation metrics, its effectiveness in realistic large-scale unlabeled data scenario, and applicability to lightweight architectures for mobile deployment.

Paper Structure

This paper contains 58 sections, 21 equations, 17 figures, 20 tables, 1 algorithm.

Figures (17)

  • Figure 1: BoundMatch applies consistency regularization (CR) to both segmentation masks and boundaries in semi-supervised semantic segmentation (SS-SS). Its core mechanism, Boundary Consistency Regularized Multi-Task Learning (BCRM), enforces this by "matching" the student's boundary predictions derived from strongly augmented inputs against boundary pseudo-labels generated by the teacher from weakly augmented inputs. To further enhance performance, lightweight fusion modules refine the teacher's initial segmentation and boundary predictions, leading to higher-quality pseudo-labels.
  • Figure 2: Illustration of the Teacher-Student framework, a common approach for consistency regularization (CR) in semi-supervised semantic segmentation (SS-SS). This framework utilizes unlabeled images by having the teacher model generate pseudo-labels from weakly augmented inputs ($u^w$). These pseudo-labels then supervise the student model, which processes corresponding strongly augmented inputs ($u^s$), thereby enforcing prediction consistency. The teacher model is typically updated via an Exponential Moving Average (EMA) of the student's weights, and its predictions are detached from the backpropagation graph (indicated by dashed lines).
  • Figure 3: Overview of the BoundMatch architecture processing labeled samples, illustrating multi-task learning (MTL) of boundary detection alongside segmentation. The Boundary-Semantic Fusion (BSF) module integrates learned boundary cues from the boundary head into the segmentation head's features, promoting sharper segmentation boundaries. Concurrently, the Spatial Gradient Fusion (SGF) module refines the boundary predictions by fusing them with the spatial gradient of the segmentation mask ($\nabla M$). This refinement improves the quality of boundary supervision for the labeled loss and is important for generating cleaner boundary pseudo-labels used in the consistency regularization process for unlabeled data.
  • Figure 4: Visualization of the output of spatial gradient operator on segmentation prediction.
  • Figure 5: Qualitative results on the Cityscapes dataset on $1/16$ split. We compare UniMatch Yang2022UniMatch with our SAMTH baseline and SAMTH+BoundMatch. Our method produces less segmentation errors especially for object boundaries and difficult categories. Best viewed zoomed in and in color.
  • ...and 12 more figures