Table of Contents
Fetching ...

SAM-SP: Self-Prompting Makes SAM Great Again

Chunpeng Zhou, Kangjie Ning, Qianqian Shen, Sheng Zhou, Zhi Yu, Haishuai Wang

TL;DR

This work tackles the domain gap in segmentation performance when applying SAM to specialized domains like medical images. It introduces SAM-SP, a self-prompting, LoRA-finetuned extension of SAM that learns to generate prompts from its own predictions and employs a self-distillation signal to refine those prompts, all without user prompts during training or inference. The approach achieves strong, prompt-free segmentation across diverse datasets, often surpassing vanilla SAM and several SAM-based methods, and it reduces reliance on expert prompts for practical deployment. The results demonstrate the viability of self-prompting and self-distillation as scalable strategies to broaden the applicability of Visual Foundation Models to domain-specific segmentation tasks, with a novel segmentation dataset Seg-GPR to bolster evaluation.

Abstract

The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategies, intended to bolster the generalizability of the vanilla SAM. However, these approaches still predominantly necessitate the utilization of domain specific expert-level prompts during the evaluation phase, which severely constrains the model's practicality. To overcome this limitation, we introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model. Specifically, SAM-SP leverages the output from the previous iteration of the model itself as prompts to guide subsequent iteration of the model. This self-prompting module endeavors to learn how to generate useful prompts autonomously and alleviates the dependence on expert prompts during the evaluation phase, significantly broadening SAM's applicability. Additionally, we integrate a self-distillation module to enhance the self-prompting process further. Extensive experiments across various domain specific datasets validate the effectiveness of the proposed SAM-SP. Our SAM-SP not only alleviates the reliance on expert prompts but also exhibits superior segmentation performance comparing to the state-of-the-art task-specific segmentation approaches, the vanilla SAM, and SAM-based approaches.

SAM-SP: Self-Prompting Makes SAM Great Again

TL;DR

This work tackles the domain gap in segmentation performance when applying SAM to specialized domains like medical images. It introduces SAM-SP, a self-prompting, LoRA-finetuned extension of SAM that learns to generate prompts from its own predictions and employs a self-distillation signal to refine those prompts, all without user prompts during training or inference. The approach achieves strong, prompt-free segmentation across diverse datasets, often surpassing vanilla SAM and several SAM-based methods, and it reduces reliance on expert prompts for practical deployment. The results demonstrate the viability of self-prompting and self-distillation as scalable strategies to broaden the applicability of Visual Foundation Models to domain-specific segmentation tasks, with a novel segmentation dataset Seg-GPR to bolster evaluation.

Abstract

The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategies, intended to bolster the generalizability of the vanilla SAM. However, these approaches still predominantly necessitate the utilization of domain specific expert-level prompts during the evaluation phase, which severely constrains the model's practicality. To overcome this limitation, we introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model. Specifically, SAM-SP leverages the output from the previous iteration of the model itself as prompts to guide subsequent iteration of the model. This self-prompting module endeavors to learn how to generate useful prompts autonomously and alleviates the dependence on expert prompts during the evaluation phase, significantly broadening SAM's applicability. Additionally, we integrate a self-distillation module to enhance the self-prompting process further. Extensive experiments across various domain specific datasets validate the effectiveness of the proposed SAM-SP. Our SAM-SP not only alleviates the reliance on expert prompts but also exhibits superior segmentation performance comparing to the state-of-the-art task-specific segmentation approaches, the vanilla SAM, and SAM-based approaches.
Paper Structure (22 sections, 3 equations, 5 figures, 5 tables)

This paper contains 22 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The illustration of our model. (a) SAM and SAM-based approaches both rely on the expert prompts during inference. (b) Our SAM-SP build a self-prompting module and do not rely on the expert prompts during inference.
  • Figure 2: The overall training architecture of our proposed SAM-SP, which inherits from SAM and contains three additional moudles: LoRA-based fine tuning, Self-Prompting module and Self-Distillation module. our proposed SAM-SP significantly enhance the segmentation capability of SAM in specific domains and alleviates the reliance on expert prompts during inference. The two prompt encoders here are shared with parameters, the same with two mask decoders.
  • Figure 3: Visualization of segmentation results
  • Figure 4: Quantitative comparison with different number of self-prompting iterations.
  • Figure 5: Comparison with different training strategies.