Table of Contents
Fetching ...

DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation

Yifan Gao, Wei Xia, Dingdu Hu, Wenkui Wang, Xin Gao

TL;DR

Domain generalization for medical image segmentation remains difficult under unseen distributions. The authors introduce DeSAM, a decoupled extension of Segment Anything Model (SAM) that splits mask generation into a prompt-relevant $IoU$-guided module (PRIM) and a prompt-decoupled mask module (PDMM), freezing the encoders to preserve pre-trained knowledge. PDMM fuses multi-scale image embeddings with PRIM-derived mask embeddings, enabling robust automatic segmentation across domains. Evaluations on cross-site prostate and cross-modality abdominal datasets show DeSAM-P achieving state-of-the-art Dice scores and outperforming prior single-source domain generalization methods, demonstrating practical potential for foundation-model-based medical segmentation with limited domain data.

Abstract

Deep learning-based medical image segmentation models often suffer from domain shift, where the models trained on a source domain do not generalize well to other unseen domains. As a prompt-driven foundation model with powerful generalization capabilities, the Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation. However, SAM performs significantly worse in automatic segmentation scenarios than when manually prompted, hindering its direct application to domain generalization. Upon further investigation, we discovered that the degradation in performance was related to the coupling effect of inevitable poor prompts and mask generation. To address the coupling effect, we propose the Decoupled SAM (DeSAM). DeSAM modifies SAM's mask decoder by introducing two new modules: a prompt-relevant IoU module (PRIM) and a prompt-decoupled mask module (PDMM). PRIM predicts the IoU score and generates mask embeddings, while PDMM extracts multi-scale features from the intermediate layers of the image encoder and fuses them with the mask embeddings from PRIM to generate the final segmentation mask. This decoupled design allows DeSAM to leverage the pre-trained weights while minimizing the performance degradation caused by poor prompts. We conducted experiments on publicly available cross-site prostate and cross-modality abdominal image segmentation datasets. The results show that our DeSAM leads to a substantial performance improvement over previous state-of-theart domain generalization methods. The code is publicly available at https://github.com/yifangao112/DeSAM.

DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation

TL;DR

Domain generalization for medical image segmentation remains difficult under unseen distributions. The authors introduce DeSAM, a decoupled extension of Segment Anything Model (SAM) that splits mask generation into a prompt-relevant -guided module (PRIM) and a prompt-decoupled mask module (PDMM), freezing the encoders to preserve pre-trained knowledge. PDMM fuses multi-scale image embeddings with PRIM-derived mask embeddings, enabling robust automatic segmentation across domains. Evaluations on cross-site prostate and cross-modality abdominal datasets show DeSAM-P achieving state-of-the-art Dice scores and outperforming prior single-source domain generalization methods, demonstrating practical potential for foundation-model-based medical segmentation with limited domain data.

Abstract

Deep learning-based medical image segmentation models often suffer from domain shift, where the models trained on a source domain do not generalize well to other unseen domains. As a prompt-driven foundation model with powerful generalization capabilities, the Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation. However, SAM performs significantly worse in automatic segmentation scenarios than when manually prompted, hindering its direct application to domain generalization. Upon further investigation, we discovered that the degradation in performance was related to the coupling effect of inevitable poor prompts and mask generation. To address the coupling effect, we propose the Decoupled SAM (DeSAM). DeSAM modifies SAM's mask decoder by introducing two new modules: a prompt-relevant IoU module (PRIM) and a prompt-decoupled mask module (PDMM). PRIM predicts the IoU score and generates mask embeddings, while PDMM extracts multi-scale features from the intermediate layers of the image encoder and fuses them with the mask embeddings from PRIM to generate the final segmentation mask. This decoupled design allows DeSAM to leverage the pre-trained weights while minimizing the performance degradation caused by poor prompts. We conducted experiments on publicly available cross-site prostate and cross-modality abdominal image segmentation datasets. The results show that our DeSAM leads to a substantial performance improvement over previous state-of-theart domain generalization methods. The code is publicly available at https://github.com/yifangao112/DeSAM.
Paper Structure (16 sections, 4 figures, 2 tables)

This paper contains 16 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the proposed DeSAM. The DeSAM consists of the image and prompt encoders of SAM, a prompt-decoupled mask module (PDMM), and a prompt-relevant IoU module (PRIM). The image encoder are used to compute the image embeddings before training. The prompt encoder is frozen during training. The PRIM consists of a cross-attention transformer and an IoU prediction head, and it utilizes the image and prompt embeddings to generate mask embeddings and IoU score. The PDMM contains multiple channel attention-based residual blocks (SRB) and upsampling operations, and it integrates the mask embeddings and image embeddings to generate the mask.
  • Figure 2: Design choices of the decoder. (a) Generating a mask by directly using the image embedding from the encoder (PDMM only). (b) PDMM and PRIM without IoU prediction head. (c) PDMM and PRIM without mask embedding fusion. (d) Our proposed DeSAM.
  • Figure 3: Quantitative results of different number of points.
  • Figure 4: Visual comparison of different methods for cross-site prostate segmentation and cross-modality abdominal multi-organ segmentation. GT represents the ground truth.