Table of Contents
Fetching ...

Customized Segment Anything Model for Medical Image Segmentation

Kaidong Zhang, Dong Liu

TL;DR

Addresses medical image segmentation by repurposing a large-scale segmentation system (SAM) for semantic medical segmentation. Proposes SAMed, which uses LoRA to fine-tune the image encoder and trains the prompt encoder and mask decoder to output tissue-class masks without requiring prompts at inference. Demonstrates competitive Synapse results (DSC 81.88, HD 20.64) and shows training strategies (warmup, AdamW) that stabilize fine-tuning while keeping deployment/storage overhead low. Concludes that SAMed offers a practical, SAM-compatible path for domain-specific segmentation with extensive ablations supporting design choices.

Abstract

We propose SAMed, a general solution for medical image segmentation. Different from the previous methods, SAMed is built upon the large-scale image segmentation model, Segment Anything Model (SAM), to explore the new research paradigm of customizing large-scale models for medical image segmentation. SAMed applies the low-rank-based (LoRA) finetuning strategy to the SAM image encoder and finetunes it together with the prompt encoder and the mask decoder on labeled medical image segmentation datasets. We also observe the warmup finetuning strategy and the AdamW optimizer lead SAMed to successful convergence and lower loss. Different from SAM, SAMed could perform semantic segmentation on medical images. Our trained SAMed model achieves 81.88 DSC and 20.64 HD on the Synapse multi-organ segmentation dataset, which is on par with the state-of-the-art methods. We conduct extensive experiments to validate the effectiveness of our design. Since SAMed only updates a small fraction of the SAM parameters, its deployment cost and storage cost are quite marginal in practical usage. The code of SAMed is available at https://github.com/hitachinsk/SAMed.

Customized Segment Anything Model for Medical Image Segmentation

TL;DR

Addresses medical image segmentation by repurposing a large-scale segmentation system (SAM) for semantic medical segmentation. Proposes SAMed, which uses LoRA to fine-tune the image encoder and trains the prompt encoder and mask decoder to output tissue-class masks without requiring prompts at inference. Demonstrates competitive Synapse results (DSC 81.88, HD 20.64) and shows training strategies (warmup, AdamW) that stabilize fine-tuning while keeping deployment/storage overhead low. Concludes that SAMed offers a practical, SAM-compatible path for domain-specific segmentation with extensive ablations supporting design choices.

Abstract

We propose SAMed, a general solution for medical image segmentation. Different from the previous methods, SAMed is built upon the large-scale image segmentation model, Segment Anything Model (SAM), to explore the new research paradigm of customizing large-scale models for medical image segmentation. SAMed applies the low-rank-based (LoRA) finetuning strategy to the SAM image encoder and finetunes it together with the prompt encoder and the mask decoder on labeled medical image segmentation datasets. We also observe the warmup finetuning strategy and the AdamW optimizer lead SAMed to successful convergence and lower loss. Different from SAM, SAMed could perform semantic segmentation on medical images. Our trained SAMed model achieves 81.88 DSC and 20.64 HD on the Synapse multi-organ segmentation dataset, which is on par with the state-of-the-art methods. We conduct extensive experiments to validate the effectiveness of our design. Since SAMed only updates a small fraction of the SAM parameters, its deployment cost and storage cost are quite marginal in practical usage. The code of SAMed is available at https://github.com/hitachinsk/SAMed.
Paper Structure (23 sections, 5 equations, 6 figures, 7 tables)

This paper contains 23 sections, 5 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: SAMed inherits the remarkable image segmentation performance from SAM and further refines the segmentation boundaries with the anatomical prior knowledge from the professional medical data during customization process. Moreover, SAMed can fully understand the semantic class of each segmentation regions by classifying these regions to different meaningful tissues for automatic medical image semantic segmentation.
  • Figure 2: The pipeline of SAMed. The framework of SAMed is consistent with SAM. We freeze the image encoder, and insert additional trainable LoRA layers to SAM for medical image feature extraction. Moreover, we finetune prompt encoder with default embeddings and mask decoder to achieve precise semantic segmentation on medical images.
  • Figure 3: The LoRA design adopted in SAMed. We apply LoRA layer to the q and v projection layers of each of the transformer block in image encoder. "Proj.q", "Proj.k", "Proj.v" and "Proj.o" represent the projection layer of q, k, v and o, respectively.
  • Figure 4: The detailed framework of mask decoder. SAMed integrates the sparse and dense embedding into the encoded image embedding. After processing by transformer layer, the segmentation maps together with their IOUs of each classes are generated individually. We adopt postprocessing to aggregate these segmentation maps to the final segmentation result.
  • Figure 5: The qualitative comparisons between SAMed and the SOTA methods, including TransUnet chen2021transunet, SwinUnet swinunet and DAE-Former azad2023daeformer.
  • ...and 1 more figures