Table of Contents
Fetching ...

SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models

Yichi Zhang, Bohao Lv, Le Xue, Wenbo Zhang, Yuchen Liu, Yu Fu, Yuan Cheng, Yuan Qi

TL;DR

The paper tackles the annotation bottleneck in medical image segmentation by proposing SemiSAM+, a foundation-model-driven SSL framework that fuses a trainable specialist with frozen SAM-like generalist models. The specialist generates positional prompts from its outputs to interact with the generalists, while the generalists provide pseudo-labels and uncertainty-guided supervision to the specialist through a confidence-aware regularization mechanism. Experiments on Left Atrium, BraTS 2019, and an in-house PET dataset demonstrate significant gains under extremely limited labels and show the method’s plug-and-play applicability across different specialist and generalist configurations. This work offers a practical path toward annotation-efficient medical segmentation with potential clinical impact.

Abstract

Deep learning-based medical image segmentation typically requires large amount of labeled data for training, making it less applicable in clinical settings due to high annotation cost. Semi-supervised learning (SSL) has emerged as an appealing strategy due to its less dependence on acquiring abundant annotations from experts compared to fully supervised methods. Beyond existing model-centric advancements of SSL by designing novel regularization strategies, we anticipate a paradigmatic shift due to the emergence of promptable segmentation foundation models with universal segmentation capabilities using positional prompts represented by Segment Anything Model (SAM). In this paper, we present SemiSAM+, a foundation model-driven SSL framework to efficiently learn from limited labeled data for medical image segmentation. SemiSAM+ consists of one or multiple promptable foundation models as generalist models, and a trainable task-specific segmentation model as specialist model. For a given new segmentation task, the training is based on the specialist-generalist collaborative learning procedure, where the trainable specialist model delivers positional prompts to interact with the frozen generalist models to acquire pseudo-labels, and then the generalist model output provides the specialist model with informative and efficient supervision which benefits the automatic segmentation and prompt generation in turn. Extensive experiments on two public datasets and one in-house clinical dataset demonstrate that SemiSAM+ achieves significant performance improvement, especially under extremely limited annotation scenarios, and shows strong efficiency as a plug-and-play strategy that can be easily adapted to different specialist and generalist models.

SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models

TL;DR

The paper tackles the annotation bottleneck in medical image segmentation by proposing SemiSAM+, a foundation-model-driven SSL framework that fuses a trainable specialist with frozen SAM-like generalist models. The specialist generates positional prompts from its outputs to interact with the generalists, while the generalists provide pseudo-labels and uncertainty-guided supervision to the specialist through a confidence-aware regularization mechanism. Experiments on Left Atrium, BraTS 2019, and an in-house PET dataset demonstrate significant gains under extremely limited labels and show the method’s plug-and-play applicability across different specialist and generalist configurations. This work offers a practical path toward annotation-efficient medical segmentation with potential clinical impact.

Abstract

Deep learning-based medical image segmentation typically requires large amount of labeled data for training, making it less applicable in clinical settings due to high annotation cost. Semi-supervised learning (SSL) has emerged as an appealing strategy due to its less dependence on acquiring abundant annotations from experts compared to fully supervised methods. Beyond existing model-centric advancements of SSL by designing novel regularization strategies, we anticipate a paradigmatic shift due to the emergence of promptable segmentation foundation models with universal segmentation capabilities using positional prompts represented by Segment Anything Model (SAM). In this paper, we present SemiSAM+, a foundation model-driven SSL framework to efficiently learn from limited labeled data for medical image segmentation. SemiSAM+ consists of one or multiple promptable foundation models as generalist models, and a trainable task-specific segmentation model as specialist model. For a given new segmentation task, the training is based on the specialist-generalist collaborative learning procedure, where the trainable specialist model delivers positional prompts to interact with the frozen generalist models to acquire pseudo-labels, and then the generalist model output provides the specialist model with informative and efficient supervision which benefits the automatic segmentation and prompt generation in turn. Extensive experiments on two public datasets and one in-house clinical dataset demonstrate that SemiSAM+ achieves significant performance improvement, especially under extremely limited annotation scenarios, and shows strong efficiency as a plug-and-play strategy that can be easily adapted to different specialist and generalist models.

Paper Structure

This paper contains 19 sections, 14 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Semi-supervised learning aims to utilize unlabeled data in conjunction with limited amount of labeled data to improve the performance (from blue to green). Existing model-centric advancements of SSL aims to exploit more efficient utilization of unlabeled data for better performance (from green to purple). SemiSAM+ represents a new paradigm to exploit pre-trained knowledge of foundation model to assist in SSL (from green to red).
  • Figure 2: An overview of segmentation foundation model represented by Segment Anything Model (SAM), which adopts an image encoder to extract image embeddings, a prompt encoder to integrate user interactions via different prompt modes, and a mask decoder to predict segmentation masks by fusing image embeddings and prompt embeddings. For any given dataset, the model can segment any target out of the image based on the positional prompting, which demonstrate universal segmentation capability to any new tasks.
  • Figure 3: Comparison of existing methods with our proposed method. (a) Specialist model for training-based task-specific automatic segmentation. (b) Generalist model for universal promptable interactive segmentation. (c) SemiSAM+: specialist-generalist collaborative learning for annotation-efficient automatic segmentation.
  • Figure 4: Overview of the proposed foundation model-driven annotation-efficient learning framework SemiSAM+. Specifically, SemiSAM+ consists of a trainable task-specific segmentation model as specialist model for downstream segmentation task and one or multiple promptable foundation models as generalist models pre-trained on large-scale datasets with zero-shot generalization ability. In SemiSAM+, an additional confidence-aware regularization is adapted for effective utilization of generalist model's supervision to avoid possible misguidance when encountering extremely limited annotation.
  • Figure 5: Comparison of Dice performance of adapting SemiSAM+ to different SSL specialist models for left atrium segmentation and brain tumor segmentation under different SSL settings.
  • ...and 2 more figures