Table of Contents
Fetching ...

SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation

Guoan Wang, Jin Ye, Junlong Cheng, Tianbin Li, Zhaolin Chen, Jianfei Cai, Junjun He, Bohan Zhuang

TL;DR

The paper addresses catastrophic forgetting when adapting volumetric medical foundation models to downstream tasks by introducing SAM-Med3D-MoE, a mixture-of-experts framework that unifies a general SAM-Med3D backbone with multiple task-specific finetuned decoders. A lightweight gating network processes image and prompt embeddings to assign confidences to experts, while a mask selector dynamically fuses outputs to preserve the original model’s capabilities and enable targeted adaptation. Key contributions include a gated MoE architecture, a novel mask selector with a threshold-driven fusion strategy, and extensive experiments showing Dice improvements from $53$ to $56.4$ on 15 classes and strong performance on the SPPIN2023 challenge (Dice $48.9$). The approach offers a practical, low-cost path to extend foundation models for 3D medical image segmentation in clinical settings, with publicly available code and data forthcoming.

Abstract

Volumetric medical image segmentation is pivotal in enhancing disease diagnosis, treatment planning, and advancing medical research. While existing volumetric foundation models for medical image segmentation, such as SAM-Med3D and SegVol, have shown remarkable performance on general organs and tumors, their ability to segment certain categories in clinical downstream tasks remains limited. Supervised Finetuning (SFT) serves as an effective way to adapt such foundation models for task-specific downstream tasks but at the cost of degrading the general knowledge previously stored in the original foundation model.To address this, we propose SAM-Med3D-MoE, a novel framework that seamlessly integrates task-specific finetuned models with the foundational model, creating a unified model at minimal additional training expense for an extra gating network. This gating network, in conjunction with a selection strategy, allows the unified model to achieve comparable performance of the original models in their respective tasks both general and specialized without updating any parameters of them.Our comprehensive experiments demonstrate the efficacy of SAM-Med3D-MoE, with an average Dice performance increase from 53 to 56.4 on 15 specific classes. It especially gets remarkable gains of 29.6, 8.5, 11.2 on the spinal cord, esophagus, and right hip, respectively. Additionally, it achieves 48.9 Dice on the challenging SPPIN2023 Challenge, significantly surpassing the general expert's performance of 32.3. We anticipate that SAM-Med3D-MoE can serve as a new framework for adapting the foundation model to specific areas in medical image analysis. Codes and datasets will be publicly available.

SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation

TL;DR

The paper addresses catastrophic forgetting when adapting volumetric medical foundation models to downstream tasks by introducing SAM-Med3D-MoE, a mixture-of-experts framework that unifies a general SAM-Med3D backbone with multiple task-specific finetuned decoders. A lightweight gating network processes image and prompt embeddings to assign confidences to experts, while a mask selector dynamically fuses outputs to preserve the original model’s capabilities and enable targeted adaptation. Key contributions include a gated MoE architecture, a novel mask selector with a threshold-driven fusion strategy, and extensive experiments showing Dice improvements from to on 15 classes and strong performance on the SPPIN2023 challenge (Dice ). The approach offers a practical, low-cost path to extend foundation models for 3D medical image segmentation in clinical settings, with publicly available code and data forthcoming.

Abstract

Volumetric medical image segmentation is pivotal in enhancing disease diagnosis, treatment planning, and advancing medical research. While existing volumetric foundation models for medical image segmentation, such as SAM-Med3D and SegVol, have shown remarkable performance on general organs and tumors, their ability to segment certain categories in clinical downstream tasks remains limited. Supervised Finetuning (SFT) serves as an effective way to adapt such foundation models for task-specific downstream tasks but at the cost of degrading the general knowledge previously stored in the original foundation model.To address this, we propose SAM-Med3D-MoE, a novel framework that seamlessly integrates task-specific finetuned models with the foundational model, creating a unified model at minimal additional training expense for an extra gating network. This gating network, in conjunction with a selection strategy, allows the unified model to achieve comparable performance of the original models in their respective tasks both general and specialized without updating any parameters of them.Our comprehensive experiments demonstrate the efficacy of SAM-Med3D-MoE, with an average Dice performance increase from 53 to 56.4 on 15 specific classes. It especially gets remarkable gains of 29.6, 8.5, 11.2 on the spinal cord, esophagus, and right hip, respectively. Additionally, it achieves 48.9 Dice on the challenging SPPIN2023 Challenge, significantly surpassing the general expert's performance of 32.3. We anticipate that SAM-Med3D-MoE can serve as a new framework for adapting the foundation model to specific areas in medical image analysis. Codes and datasets will be publicly available.
Paper Structure (11 sections, 1 equation, 4 figures, 2 tables)

This paper contains 11 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Advantages of SAM-Med3D-MoE in general tasks and specific downstream tasks. (a) SAM-Med3D, a foundational model for volumetric medical image segmentation, demonstrates remarkable performance in segmenting general organs and tumors. However, its performance is notably less effective in segmenting neuroblastoma as observed in the SPPIN2023 challenge. (b) After finetuning SAM-Med3D on the SPPIN2023, it enhanced its performance on neuroblastoma segmentation but diminished its overall segmentation capability. (c) Our method is competent for both general and downstream tasks.
  • Figure 2: Overview of our SAM-Med3D-MoE approach. The outputs of 3D Image Encoder and Prompt Encoder undergo the dynamic selection by the gating mechanism. If the weight of the top-1 selection after softmax exceeds $\tau$, the most proficient finetuned expert decoder (specific expert) is chosen, together with 3D Mask Decoder (general expert). Conversely, if the weight does not exceed $\tau$, only 3D Mask Decoder is utilized.
  • Figure 3: Details of the gating network and the selector.
  • Figure 4: Our model exhibits strong performance on (a) the 15 selected categories as well as on (b) the original un-finetuned categories.