Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Jiayuan Zhu; Abdullah Hamdi; Yunli Qi; Yueming Jin; Junde Wu

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Jiayuan Zhu, Abdullah Hamdi, Yunli Qi, Yueming Jin, Junde Wu

TL;DR

MedSAM-2 reinterprets medical image segmentation as a video-like, auto-tracking problem by extending SAM2 with a self-sorting memory bank. This memory mechanism selects informative, diverse embeddings based on confidence and dissimilarity, enabling One-Prompt Segmentation for 2D data and robust 3D segmentation across unordered slices. Across 25 tasks and 14 benchmarks, MedSAM-2 achieves state-of-the-art results and reduced user prompting, demonstrating strong generalization to unseen modalities. The approach blends memory-augmented attention with multi-orientation 3D processing, offering a practical, low-interaction solution for clinical workflows.

Abstract

Medical image segmentation plays a pivotal role in clinical diagnostics and treatment planning, yet existing models often face challenges in generalization and in handling both 2D and 3D data uniformly. In this paper, we introduce Medical SAM 2 (MedSAM-2), a generalized auto-tracking model for universal 2D and 3D medical image segmentation. The core concept is to leverage the Segment Anything Model 2 (SAM2) pipeline to treat all 2D and 3D medical segmentation tasks as a video object tracking problem. To put it into practice, we propose a novel \emph{self-sorting memory bank} mechanism that dynamically selects informative embeddings based on confidence and dissimilarity, regardless of temporal order. This mechanism not only significantly improves performance in 3D medical image segmentation but also unlocks a \emph{One-Prompt Segmentation} capability for 2D images, allowing segmentation across multiple images from a single prompt without temporal relationships. We evaluated MedSAM-2 on five 2D tasks and nine 3D tasks, including white blood cells, optic cups, retinal vessels, mandibles, coronary arteries, kidney tumors, liver tumors, breast cancer, nasopharynx cancer, vestibular schwannoma, mediastinal lymph nodules, cerebral artery, inferior alveolar nerve, and abdominal organs, comparing it against state-of-the-art (SOTA) models in task-tailored, general and interactive segmentation settings. Our findings demonstrate that MedSAM-2 surpasses a wide range of existing models and updates new SOTA on several benchmarks. The code is released on the project page: https://supermedintel.github.io/Medical-SAM2/.

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

TL;DR

Abstract

Paper Structure (24 sections, 13 equations, 7 figures, 3 tables)

This paper contains 24 sections, 13 equations, 7 figures, 3 tables.

Introduction
Related Works
Method
Preliminaries on Segment Anything Model (SAM 2)
MedSAM-2: Self-Sorting SAM2 for Medical Imaging
Unified Approach for 2D and 3D Images
Experiment
Dataset
Human-User Prompted Evaluation
Implementation
Results
Performance of Universal Medical Image Segmentation
One-prompt Segmentation Performance under different prompts
Analysis and Ablation Study
Conclusion
...and 9 more sections

Figures (7)

Figure 1: Segmentation Capabilities of MedSAM-2. When provided with a prompt in one 3D slice, MedSAM-2 can segment all later spatial-temporal 3D frames. When given a prompt in one 2D image, MedSAM-2 can accurately segment other 2D images that are not temporally related using the same criteria, which is an emergence of One-prompt Segmentation capability.
Figure 2: MedSAM-2 Framework. Building on the SAM 2 framework, we propose treating 3D medical images and 2D medical image flows as videos to facilitate memory-enhanced medical image segmentation. This approach not only improves performance in 3D medical image segmentation but also unlocks One-Prompt Segmentation capability for 2D medical image flows. This is achieved by incorporating our proposed Self-Sorting Memory Bank, which selects the most confident embeddings based on the confidence predictions ($\alpha$, $\beta$, $\gamma$) from the mask decoder.
Figure 3: Qualitative Comparison on 3D Medical Image Segmentation. We show comparison of MedSAM MedSAM, our MedSAM-2, and ground truth on sequential 3D medical image segmentation on the BTCV dataset fang2020multi. Note how our MedSAM-2 produce more consistent 3D predictions leveraging the 3D context and maintaining high generalization capability compared to MedSAM MedSAM.
Figure 4: Qualitative Examples of MedSAM-2 for 2D One-Prompt Segmentation & 3D Segmentation. We show several examples of 2D segmentation on diverse datasets.
Figure 5: One-prompt 2D Segmentation Performance. We show MedSAM-2 v.s. Few/One-shot Models under One-prompt Segmentation setting on 10 datasets with different prompts. Our MedSAM-2 colored by the darkest blue on the right of each bar group.
...and 2 more figures

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

TL;DR

Abstract

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Authors

TL;DR

Abstract

Table of Contents

Figures (7)