Table of Contents
Fetching ...

S^4M: Boosting Semi-Supervised Instance Segmentation with SAM

Heeji Yoon, Heeseong Shin, Eunbeen Hong, Hyunwook Choi, Hansang Cho, Daun Jeong, Seungryong Kim

TL;DR

S^4M tackles semi-supervised instance segmentation under limited labeling by integrating SAM into a teacher–student framework through three core components: structural distillation to imprint SAM’s fine-grained localization into the teacher, pseudo-label refinement to improve labeling quality on unlabeled data, and instance-aware augmentation (ARP) to generate diverse, realistic training samples. The method carefully leverages SAM’s strengths while mitigating its class-agnostic tendencies, yielding state-of-the-art results on Cityscapes and COCO at very low label ratios. Extensive ablations show that distilling decoder-based self-similarity, refining pseudo-labels, and combining ARP with a strong teacher produce the largest gains, with additional insights into when and how to apply each component. Overall, S^4M demonstrates that SAM can significantly enhance semi-supervised instance segmentation when integrated with targeted distillation and augmentation strategies, offering practical improvements for data-scarce scenarios.

Abstract

Semi-supervised instance segmentation poses challenges due to limited labeled data, causing difficulties in accurately localizing distinct object instances. Current teacher-student frameworks still suffer from performance constraints due to unreliable pseudo-label quality stemming from limited labeled data. While the Segment Anything Model (SAM) offers robust segmentation capabilities at various granularities, directly applying SAM to this task introduces challenges such as class-agnostic predictions and potential over-segmentation. To address these complexities, we carefully integrate SAM into the semi-supervised instance segmentation framework, developing a novel distillation method that effectively captures the precise localization capabilities of SAM without compromising semantic recognition. Furthermore, we incorporate pseudo-label refinement as well as a specialized data augmentation with the refined pseudo-labels, resulting in superior performance. We establish state-of-the-art performance, and provide comprehensive experiments and ablation studies to validate the effectiveness of our proposed approach.

S^4M: Boosting Semi-Supervised Instance Segmentation with SAM

TL;DR

S^4M tackles semi-supervised instance segmentation under limited labeling by integrating SAM into a teacher–student framework through three core components: structural distillation to imprint SAM’s fine-grained localization into the teacher, pseudo-label refinement to improve labeling quality on unlabeled data, and instance-aware augmentation (ARP) to generate diverse, realistic training samples. The method carefully leverages SAM’s strengths while mitigating its class-agnostic tendencies, yielding state-of-the-art results on Cityscapes and COCO at very low label ratios. Extensive ablations show that distilling decoder-based self-similarity, refining pseudo-labels, and combining ARP with a strong teacher produce the largest gains, with additional insights into when and how to apply each component. Overall, S^4M demonstrates that SAM can significantly enhance semi-supervised instance segmentation when integrated with targeted distillation and augmentation strategies, offering practical improvements for data-scarce scenarios.

Abstract

Semi-supervised instance segmentation poses challenges due to limited labeled data, causing difficulties in accurately localizing distinct object instances. Current teacher-student frameworks still suffer from performance constraints due to unreliable pseudo-label quality stemming from limited labeled data. While the Segment Anything Model (SAM) offers robust segmentation capabilities at various granularities, directly applying SAM to this task introduces challenges such as class-agnostic predictions and potential over-segmentation. To address these complexities, we carefully integrate SAM into the semi-supervised instance segmentation framework, developing a novel distillation method that effectively captures the precise localization capabilities of SAM without compromising semantic recognition. Furthermore, we incorporate pseudo-label refinement as well as a specialized data augmentation with the refined pseudo-labels, resulting in superior performance. We establish state-of-the-art performance, and provide comprehensive experiments and ablation studies to validate the effectiveness of our proposed approach.

Paper Structure

This paper contains 33 sections, 4 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Analysis on pseudo-labels by the teacher in a teacher-student framework for semi-supervised instance segmentation. (a) Bottleneck analysis revealing that the primary limitation lies in mask quality rather than classification. Note that class accuracy (CA) is computed on matched pairs with IoU $>$ 0.5, and segmentation quality (SQ) is measured by the standard segmentation quality metric from panoptic quality kirillov2019panoptic. (b) Example failure cases with correct, confident class prediction but inaccurate masks.
  • Figure 2: Overall pipeline of the proposed framework, $\mathbf{S^4M}$. We propose $\mathbf{S^4M}$, a semi-supervised instance segmentation framework that effectively leverages SAM knowledge through three key approaches. First, we improve the teacher network through structural distillation, which distills SAM's inherent spatial understanding. Then, as the student learns from unlabeled images, we apply pseudo-label refinement based on SAM's strong segmentation capability, and further enhance training with instance-aware augmentation, ARP, which leverages the improved pseudo-labels.
  • Figure 3: Illustration of structural distillation with SAM for training the teacher. We distill the self-similarity matrix extracted from the decoder feature of SAM to enhance the teacher for addressing under-segmentation.
  • Figure 4: Visualization of pseudo-labels before and after refinement. We visualize pseudo-labels from the teacher network before (left) and after (right) refinement. With SAM, we can refine pseudo-labels with under-segmentation, often containing noisy parts of nearby instances, into high-quality pseudo-labels.
  • Figure 5: Qualitative comparison on the Cityscapes dataset cordts2016cityscapes using 10% labeled data, comparing the baseline semi-supervised method GuidedDistillation Wang_2022_CVPR (top), and our approach (bottom). Compared to supervised training and the baseline method, our approach not only detects and segments instances more accurately but also exhibits higher discriminability between instances of the same class.
  • ...and 6 more figures