Table of Contents
Fetching ...

SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical Images

Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao

TL;DR

SAM-Med3D presents a fully 3D, promptable segmentation framework trained on a large-scale volumetric medical dataset to achieve general-purpose segmentation across diverse organs, lesions, and modalities. Through a two-stage training regime and 3D positional encoding, it outperforms 2D-adapted and task-specific baselines while offering superior efficiency and transferability as a pre-trained encoder for downstream tasks. The SA-Med3D-140K dataset key to this generalization, combined with comprehensive 16-dataset evaluations, demonstrates robust cross-domain performance and practical potential for clinical workflows. The work provides a valuable resource and establishes a strong benchmark for promptable 3D medical image segmentation.

Abstract

Existing volumetric medical image segmentation models are typically task-specific, excelling at specific target but struggling to generalize across anatomical structures or modalities. This limitation restricts their broader clinical use. In this paper, we introduce SAM-Med3D for general-purpose segmentation on volumetric medical images. Given only a few 3D prompt points, SAM-Med3D can accurately segment diverse anatomical structures and lesions across various modalities. To achieve this, we gather and process a large-scale 3D medical image dataset, SA-Med3D-140K, from a blend of public sources and licensed private datasets. This dataset includes 22K 3D images and 143K corresponding 3D masks. Then SAM-Med3D, a promptable segmentation model characterized by the fully learnable 3D structure, is trained on this dataset using a two-stage procedure and exhibits impressive performance on both seen and unseen segmentation targets. We comprehensively evaluate SAM-Med3D on 16 datasets covering diverse medical scenarios, including different anatomical structures, modalities, targets, and zero-shot transferability to new/unseen tasks. The evaluation shows the efficiency and efficacy of SAM-Med3D, as well as its promising application to diverse downstream tasks as a pre-trained model. Our approach demonstrates that substantial medical resources can be utilized to develop a general-purpose medical AI for various potential applications. Our dataset, code, and models are available at https://github.com/uni-medical/SAM-Med3D.

SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical Images

TL;DR

SAM-Med3D presents a fully 3D, promptable segmentation framework trained on a large-scale volumetric medical dataset to achieve general-purpose segmentation across diverse organs, lesions, and modalities. Through a two-stage training regime and 3D positional encoding, it outperforms 2D-adapted and task-specific baselines while offering superior efficiency and transferability as a pre-trained encoder for downstream tasks. The SA-Med3D-140K dataset key to this generalization, combined with comprehensive 16-dataset evaluations, demonstrates robust cross-domain performance and practical potential for clinical workflows. The work provides a valuable resource and establishes a strong benchmark for promptable 3D medical image segmentation.

Abstract

Existing volumetric medical image segmentation models are typically task-specific, excelling at specific target but struggling to generalize across anatomical structures or modalities. This limitation restricts their broader clinical use. In this paper, we introduce SAM-Med3D for general-purpose segmentation on volumetric medical images. Given only a few 3D prompt points, SAM-Med3D can accurately segment diverse anatomical structures and lesions across various modalities. To achieve this, we gather and process a large-scale 3D medical image dataset, SA-Med3D-140K, from a blend of public sources and licensed private datasets. This dataset includes 22K 3D images and 143K corresponding 3D masks. Then SAM-Med3D, a promptable segmentation model characterized by the fully learnable 3D structure, is trained on this dataset using a two-stage procedure and exhibits impressive performance on both seen and unseen segmentation targets. We comprehensively evaluate SAM-Med3D on 16 datasets covering diverse medical scenarios, including different anatomical structures, modalities, targets, and zero-shot transferability to new/unseen tasks. The evaluation shows the efficiency and efficacy of SAM-Med3D, as well as its promising application to diverse downstream tasks as a pre-trained model. Our approach demonstrates that substantial medical resources can be utilized to develop a general-purpose medical AI for various potential applications. Our dataset, code, and models are available at https://github.com/uni-medical/SAM-Med3D.
Paper Structure (20 sections, 5 figures, 7 tables)

This paper contains 20 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Illustration of SAM sam, fine-tuned SAM (SAM-Med2D sammed2d), and our SAM-Med3D on 3D Volumetric Medical Images. Both SAM and SAM-Med2D take $N$ prompt points (one for each slice) whereas SAM-Med3D uses a single prompt point for the entire 3D volume. Here, $N$ corresponds to the number of slices containing the target object. The top-left corner provides a schematic of the Axial, Coronal, and Sagittal views. For a given 3D input, we visualize the 3D, coronal, and multiple axial views. The numbers in brackets indicate the index of each axial slice.
  • Figure 2: Overview of SA-Med3D-140K. (a) The word cloud maps for category statistics of all training data. There are 245 categories in our training data. (b) Comparison of counts of images and masks in the 3D medical image datasets. Our dataset consists of 22K 3D images with corresponding 143K 3D masks, while AMOSamos, TotalSegmentatortotalsegmentator have less than 2K images, and BraTS21brats2021 has less than 10K masks.
  • Figure 3: The fully 3D architecture of our SAM-Med3D, encompassing a 3D image encoder, 3D prompt encoder, and 3D mask decoder. 3D positional encoding (PE) and 3D layers like convolution and layer normalization are employed to construct it.
  • Figure 4: (a-c) Comparison across different modalities with varying numbers of points. Despite not being trained on the US modality like SAM-Med2D, SAM-Med3D still shows competitive performance. (d) Comparison of the Dice score between SAM-Med3D and the 2D fine-tuned SAM, SAM-Med2D sammed2d across 44 major organs and 5 kinds of lesions. $*$ and $**$ represent unseen organs and lesions.
  • Figure 5: Visualization of SAM, SAM-Med2D, and our SAM-Med3D across diverse anatomical structures and modalities for 1 or 5 points. We present both axial and coronal/sagittal views to illustrate the 3D results comprehensively.