Table of Contents
Fetching ...

DEAP-3DSAM: Decoder Enhanced and Auto Prompt SAM for 3D Medical Image Segmentation

Fangda Chen, Jintao Tang, Pancheng Wang, Ting Wang, Shasha Li, Ting Deng

TL;DR

DEAP-3DSAM addresses key limitations of SAM-based 3D medical segmentation by introducing a Feature Enhanced Decoder that fuses original image spatial details with SAM features and a Dual Attention Prompter that automatically generates prompt information via Spatial and Channel Attention. The method uses a 12-layer Image Encoder with a Scale Parallel Adapter, processing pseudo-3D patches to produce multiscale features that feed the decoder at multiple depths. Across four abdominal tumor datasets, DEAP-3DSAM achieves state-of-the-art or competitive results, outperforming or matching manual prompt methods and offering notable efficiency gains through linear self-attention and parameter sharing. The approach demonstrates the practical potential of fully automated SAM-based 3D segmentation, with implications for automated lesion localization and feature extraction in medical imaging.

Abstract

The Segment Anything Model (SAM) has recently demonstrated significant potential in medical image segmentation. Although SAM is primarily trained on 2D images, attempts have been made to apply it to 3D medical image segmentation. However, the pseudo 3D processing used to adapt SAM results in spatial feature loss, limiting its performance. Additionally, most SAM-based methods still rely on manual prompts, which are challenging to implement in real-world scenarios and require extensive external expert knowledge. To address these limitations, we introduce the Decoder Enhanced and Auto Prompt SAM (DEAP-3DSAM) to tackle these limitations. Specifically, we propose a Feature Enhanced Decoder that fuses the original image features with rich and detailed spatial information to enhance spatial features. We also design a Dual Attention Prompter to automatically obtain prompt information through Spatial Attention and Channel Attention. We conduct comprehensive experiments on four public abdominal tumor segmentation datasets. The results indicate that our DEAP-3DSAM achieves state-of-the-art performance in 3D image segmentation, outperforming or matching existing manual prompt methods. Furthermore, both quantitative and qualitative ablation studies confirm the effectiveness of our proposed modules.

DEAP-3DSAM: Decoder Enhanced and Auto Prompt SAM for 3D Medical Image Segmentation

TL;DR

DEAP-3DSAM addresses key limitations of SAM-based 3D medical segmentation by introducing a Feature Enhanced Decoder that fuses original image spatial details with SAM features and a Dual Attention Prompter that automatically generates prompt information via Spatial and Channel Attention. The method uses a 12-layer Image Encoder with a Scale Parallel Adapter, processing pseudo-3D patches to produce multiscale features that feed the decoder at multiple depths. Across four abdominal tumor datasets, DEAP-3DSAM achieves state-of-the-art or competitive results, outperforming or matching manual prompt methods and offering notable efficiency gains through linear self-attention and parameter sharing. The approach demonstrates the practical potential of fully automated SAM-based 3D segmentation, with implications for automated lesion localization and feature extraction in medical imaging.

Abstract

The Segment Anything Model (SAM) has recently demonstrated significant potential in medical image segmentation. Although SAM is primarily trained on 2D images, attempts have been made to apply it to 3D medical image segmentation. However, the pseudo 3D processing used to adapt SAM results in spatial feature loss, limiting its performance. Additionally, most SAM-based methods still rely on manual prompts, which are challenging to implement in real-world scenarios and require extensive external expert knowledge. To address these limitations, we introduce the Decoder Enhanced and Auto Prompt SAM (DEAP-3DSAM) to tackle these limitations. Specifically, we propose a Feature Enhanced Decoder that fuses the original image features with rich and detailed spatial information to enhance spatial features. We also design a Dual Attention Prompter to automatically obtain prompt information through Spatial Attention and Channel Attention. We conduct comprehensive experiments on four public abdominal tumor segmentation datasets. The results indicate that our DEAP-3DSAM achieves state-of-the-art performance in 3D image segmentation, outperforming or matching existing manual prompt methods. Furthermore, both quantitative and qualitative ablation studies confirm the effectiveness of our proposed modules.

Paper Structure

This paper contains 24 sections, 18 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: An illustration of pseudo 3D patching and expected 3D patching.
  • Figure 2: The overall framework of our DEAP-3DSAM. (a) represents the overall processes involved. The 3D medical image is initially decomposed into a series of patches through a pseudo 3D patching process. These patches are subsequently processed by the Image Encoder. The feature map from the final transformer layer, denoted as $z_{12}$, must be processed by the Dual Attention Prompter. Subsequently, $z_{12}$, along with other intermediate feature maps, $z_3$, $z_6$, and $z_9$, is fed into the Feature Enhanced Decoder. The final segmentation predictions are generated by the Predict Layer within the Feature Enhanced Decoder. (b) illustrates the Feature Enhanced Decoder, primarily composed of Original Feature Enhancers. The Original Feature Enhancer merges the upsampled feature map with the original features from the input image, subsequently outputting the augmented features through a convolutional block. (c) demonstrates that the Dual Attention Prompter employs both Spatial and Channel Attention, followed by the concatenation of the resulting features for output. Finally, (d), (e), and (f) depict the structural components of the adapter, convolution block, and prediction layer, respectively.
  • Figure 3: Qualitative comparison visualization of DEAP-3DSAM and baselines on four datasets.
  • Figure 4: Qualitative analysis visualization of Feature Enhanced Decoder (FED) on four datasets.
  • Figure 5: Quantitative analysis visualization of Feature Enhanced Decoder (FED) on four datasets.