Table of Contents
Fetching ...

Stitching, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation

Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao

TL;DR

This work presents SFR, a three-stage framework (Stitching, Fine-tuning, Re-training) that enables SAM-based semi-supervised 3D medical image segmentation by converting 3D volumes into large 2D inputs for robust pseudo-label initialization. The fine-tuning stage uses parameter-efficient methods (e.g., LoRA) on stitched data, while the re-training SSL stage learns from labeled data and SAM-generated pseudo-labels to produce accurate 3D segmentations with a smaller downstream model footprint. An extended SFR+ adds confidence estimation and selective training to further leverage unlabeled data. Across five datasets, SFR and SFR+ achieve substantial gains under moderate and scarce annotation regimes, including substantial improvements on LA with only one labeled volume, approaching or matching fully supervised performance in several cases. This framework demonstrates strong compatibility with diverse SSL methods and offers a practical, plug-and-play path to efficient foundation-model-guided semi-supervised medical image segmentation.

Abstract

Segment Anything Model (SAM) fine-tuning has shown remarkable performance in medical image segmentation in a fully supervised manner, but requires precise annotations. To reduce the annotation cost and maintain satisfactory performance, in this work, we leverage the capabilities of SAM for establishing semi-supervised medical image segmentation models. Rethinking the requirements of effectiveness, efficiency, and compatibility, we propose a three-stage framework, i.e., Stitching, Fine-tuning, and Re-training (SFR). The current fine-tuning approaches mostly involve 2D slice-wise fine-tuning that disregards the contextual information between adjacent slices. Our stitching strategy mitigates the mismatch between natural and 3D medical images. The stitched images are then used for fine-tuning SAM, providing robust initialization of pseudo-labels. Afterwards, we train a 3D semi-supervised segmentation model while maintaining the same parameter size as the conventional segmenter such as V-Net. Our SFR framework is plug-and-play, and easily compatible with various popular semi-supervised methods. We also develop an extended framework SFR$^+$ with selective fine-tuning and re-training through confidence estimation. Extensive experiments validate that our SFR and SFR$^+$ achieve significant improvements in both moderate annotation and scarce annotation across five datasets. In particular, SFR framework improves the Dice score of Mean Teacher from 29.68% to 74.40% with only one labeled data of LA dataset.

Stitching, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation

TL;DR

This work presents SFR, a three-stage framework (Stitching, Fine-tuning, Re-training) that enables SAM-based semi-supervised 3D medical image segmentation by converting 3D volumes into large 2D inputs for robust pseudo-label initialization. The fine-tuning stage uses parameter-efficient methods (e.g., LoRA) on stitched data, while the re-training SSL stage learns from labeled data and SAM-generated pseudo-labels to produce accurate 3D segmentations with a smaller downstream model footprint. An extended SFR+ adds confidence estimation and selective training to further leverage unlabeled data. Across five datasets, SFR and SFR+ achieve substantial gains under moderate and scarce annotation regimes, including substantial improvements on LA with only one labeled volume, approaching or matching fully supervised performance in several cases. This framework demonstrates strong compatibility with diverse SSL methods and offers a practical, plug-and-play path to efficient foundation-model-guided semi-supervised medical image segmentation.

Abstract

Segment Anything Model (SAM) fine-tuning has shown remarkable performance in medical image segmentation in a fully supervised manner, but requires precise annotations. To reduce the annotation cost and maintain satisfactory performance, in this work, we leverage the capabilities of SAM for establishing semi-supervised medical image segmentation models. Rethinking the requirements of effectiveness, efficiency, and compatibility, we propose a three-stage framework, i.e., Stitching, Fine-tuning, and Re-training (SFR). The current fine-tuning approaches mostly involve 2D slice-wise fine-tuning that disregards the contextual information between adjacent slices. Our stitching strategy mitigates the mismatch between natural and 3D medical images. The stitched images are then used for fine-tuning SAM, providing robust initialization of pseudo-labels. Afterwards, we train a 3D semi-supervised segmentation model while maintaining the same parameter size as the conventional segmenter such as V-Net. Our SFR framework is plug-and-play, and easily compatible with various popular semi-supervised methods. We also develop an extended framework SFR with selective fine-tuning and re-training through confidence estimation. Extensive experiments validate that our SFR and SFR achieve significant improvements in both moderate annotation and scarce annotation across five datasets. In particular, SFR framework improves the Dice score of Mean Teacher from 29.68% to 74.40% with only one labeled data of LA dataset.
Paper Structure (36 sections, 8 equations, 12 figures, 15 tables)

This paper contains 36 sections, 8 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Comparison of our SFR framework with extended foundation models and semi-supervised medical segmentation methods on LA dataset xiong2021global with 16 labeled data.
  • Figure 2: Overview of the proposed SFR framework, which includes three modules: Stitching, Fine-tuning and Re-training.
  • Figure 3: Comparison of different input strategies. Small-size fine-tuning reduces the input size through bilinear interpolation and upsampling fine-tuning directly upsamples each slice. Taking $d = 4$ as an example.
  • Figure 4: Disrupting slice continuity. Comparison with random slice rotation and flipping, and random shuffling. $d = 3$ as an example.
  • Figure 5: Disrupting contextual integrity. Comparison with stitching with natural images. $d=3$ as an example.
  • ...and 7 more figures