Table of Contents
Fetching ...

SAM3D: Segment Anything Model in Volumetric Medical Images

Nhat-Tan Bui, Dinh-Hieu Hoang, Minh-Triet Tran, Gianfranco Doretto, Donald Adjeroh, Brijesh Patel, Arabinda Choudhary, Ngan Le

TL;DR

SAM3D addresses 3D medical image segmentation by coupling a frozen SAM image encoder with a lightweight 3D decoder to capture depth relationships across slices. Unlike slice-by-slice SAM approaches, it processes entire volumes in a unified framework, achieving competitive accuracy with substantially fewer trainable parameters. The method is validated on four datasets (ACDC, Synapse, BraTS, Lung), demonstrating strong Dice and HD metrics while maintaining a compact model size (≈1.88M parameters). This work shows that adapting a pretrained natural-image SAM in a simple, prompt-free fashion can yield efficient and effective volumetric segmentation with broad practical impact.

Abstract

Image segmentation remains a pivotal component in medical image analysis, aiding in the extraction of critical information for precise diagnostic practices. With the advent of deep learning, automated image segmentation methods have risen to prominence, showcasing exceptional proficiency in processing medical imagery. Motivated by the Segment Anything Model (SAM)-a foundational model renowned for its remarkable precision and robust generalization capabilities in segmenting 2D natural images-we introduce SAM3D, an innovative adaptation tailored for 3D volumetric medical image analysis. Unlike current SAM-based methods that segment volumetric data by converting the volume into separate 2D slices for individual analysis, our SAM3D model processes the entire 3D volume image in a unified approach. Extensive experiments are conducted on multiple medical image datasets to demonstrate that our network attains competitive results compared with other state-of-the-art methods in 3D medical segmentation tasks while being significantly efficient in terms of parameters. Code and checkpoints are available at https://github.com/UARK-AICV/SAM3D.

SAM3D: Segment Anything Model in Volumetric Medical Images

TL;DR

SAM3D addresses 3D medical image segmentation by coupling a frozen SAM image encoder with a lightweight 3D decoder to capture depth relationships across slices. Unlike slice-by-slice SAM approaches, it processes entire volumes in a unified framework, achieving competitive accuracy with substantially fewer trainable parameters. The method is validated on four datasets (ACDC, Synapse, BraTS, Lung), demonstrating strong Dice and HD metrics while maintaining a compact model size (≈1.88M parameters). This work shows that adapting a pretrained natural-image SAM in a simple, prompt-free fashion can yield efficient and effective volumetric segmentation with broad practical impact.

Abstract

Image segmentation remains a pivotal component in medical image analysis, aiding in the extraction of critical information for precise diagnostic practices. With the advent of deep learning, automated image segmentation methods have risen to prominence, showcasing exceptional proficiency in processing medical imagery. Motivated by the Segment Anything Model (SAM)-a foundational model renowned for its remarkable precision and robust generalization capabilities in segmenting 2D natural images-we introduce SAM3D, an innovative adaptation tailored for 3D volumetric medical image analysis. Unlike current SAM-based methods that segment volumetric data by converting the volume into separate 2D slices for individual analysis, our SAM3D model processes the entire 3D volume image in a unified approach. Extensive experiments are conducted on multiple medical image datasets to demonstrate that our network attains competitive results compared with other state-of-the-art methods in 3D medical segmentation tasks while being significantly efficient in terms of parameters. Code and checkpoints are available at https://github.com/UARK-AICV/SAM3D.
Paper Structure (6 sections, 3 equations, 3 figures, 5 tables)

This paper contains 6 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overall architecture of the proposed SAM3D. Given a volumetric image $I \in \mathbb{R}^{H\times W\times D}$, SAM3D initially applies SAM to process each of the $D$ slices individually, producing slice embeddings denoted as $F \in \mathbb{R}^{\frac{H}{16}\times \frac{W}{16}\times D\times 256}$. These embeddings are then decoded by a lightweight 3D decoder, ultimately yielding the segmentation prediction.
  • Figure 2: Architecture of the proposed lightweight 3D decoder.
  • Figure 3: Qualitative comparison between our SAM3D ($3^{rd}$ column) and other SAM-based volumetric segmentation models SAMed ($4^{th}$ column) and SAMed_s ($5^{th}$ column) on Synapse dataset. SAMed and SAMed_s require 18.81M and 6.32M params whereas our SAM3D needs only 1.88M.