Slide-SAM: Medical SAM Meets Sliding Window
Quan Quan, Fenghe Tang, Zikang Xu, Heqin Zhu, S. Kevin Zhou
TL;DR
This work addresses the challenge of applying a pre-trained 2D segmentation model (SAM) to 3D medical images by introducing Slide-SAM, which uses a three-slice sliding window to predict simultaneous masks across adjacent slices with prompts on the central slice. It preserves SAM’s pre-trained strengths by freezing the backbone and reusing decoder weights, while enabling efficient multi-slice inference through LoRA-based fine-tuning and a hybrid loss that accommodates both 3D labels and SAM-generated 2D pseudo-labels. Empirical results across CHAOS, BTCV, WORD, and MSD datasets show improved 3D segmentation with minimal prompts, enhanced annotation efficiency, and robustness to noisy prompts, highlighting Slide-SAM’s potential to accelerate clinical annotation workflows. The approach combines architectural adaptation, data enrichment, and task-aligned loss to achieve coherent 3D segmentations with practical inference speed and memory usage improvements.
Abstract
The Segment Anything Model (SAM) has achieved a notable success in two-dimensional image segmentation in natural images. However, the substantial gap between medical and natural images hinders its direct application to medical image segmentation tasks. Particularly in 3D medical images, SAM struggles to learn contextual relationships between slices, limiting its practical applicability. Moreover, applying 2D SAM to 3D images requires prompting the entire volume, which is time- and label-consuming. To address these problems, we propose Slide-SAM, which treats a stack of three adjacent slices as a prediction window. It firstly takes three slices from a 3D volume and point- or bounding box prompts on the central slice as inputs to predict segmentation masks for all three slices. Subsequently, the masks of the top and bottom slices are then used to generate new prompts for adjacent slices. Finally, step-wise prediction can be achieved by sliding the prediction window forward or backward through the entire volume. Our model is trained on multiple public and private medical datasets and demonstrates its effectiveness through extensive 3D segmetnation experiments, with the help of minimal prompts. Code is available at \url{https://github.com/Curli-quan/Slide-SAM}.
