PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation

Md Mostafijur Rahman; Mustafa Munir; Debesh Jha; Ulas Bagci; Radu Marculescu

PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation

Md Mostafijur Rahman, Mustafa Munir, Debesh Jha, Ulas Bagci, Radu Marculescu

TL;DR

PP-SAM introduces a robust, data-efficient fine-tuning framework for the Segment Anything Model (SAM) to polyp segmentation under limited data. By injecting variable bounding box prompt perturbations during training and evaluating across a spectrum of inference perturbations, PP-SAM markedly improves $DICE$ robustness and cross-center generalization. Key findings show substantial gains on Kvasir with few-shot settings and strong improvements over zero-shot and recent SOTA methods on unseen clinics, highlighting practical utility for multi-center colorectal cancer screening with reduced annotation burden. The approach emphasizes freezing the mask decoder while fine-tuning the image and prompt encoders, and it demonstrates broad applicability to medical imaging tasks with limited samples; code is openly available.

Abstract

The Segment Anything Model (SAM), originally designed for general-purpose segmentation tasks, has been used recently for polyp segmentation. Nonetheless, fine-tuning SAM with data from new imaging centers or clinics poses significant challenges. This is because this necessitates the creation of an expensive and time-intensive annotated dataset, along with the potential for variability in user prompts during inference. To address these issues, we propose a robust fine-tuning technique, PP-SAM, that allows SAM to adapt to the polyp segmentation task with limited images. To this end, we utilize variable perturbed bounding box prompts (BBP) to enrich the learning context and enhance the model's robustness to BBP perturbations during inference. Rigorous experiments on polyp segmentation benchmarks reveal that our variable BBP perturbation significantly improves model resilience. Notably, on Kvasir, 1-shot fine-tuning boosts the DICE score by 20% and 37% with 50 and 100-pixel BBP perturbations during inference, respectively. Moreover, our experiments show that 1-shot, 5-shot, and 10-shot PP-SAM with 50-pixel perturbations during inference outperform a recent state-of-the-art (SOTA) polyp segmentation method by 26%, 7%, and 5% DICE scores, respectively. Our results motivate the broader applicability of our PP-SAM for other medical imaging tasks with limited samples. Our implementation is available at https://github.com/SLDGroup/PP-SAM.

PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation

TL;DR

robustness and cross-center generalization. Key findings show substantial gains on Kvasir with few-shot settings and strong improvements over zero-shot and recent SOTA methods on unseen clinics, highlighting practical utility for multi-center colorectal cancer screening with reduced annotation burden. The approach emphasizes freezing the mask decoder while fine-tuning the image and prompt encoders, and it demonstrates broad applicability to medical imaging tasks with limited samples; code is openly available.

Abstract

Paper Structure (28 sections, 1 equation, 9 figures)

This paper contains 28 sections, 1 equation, 9 figures.

Introduction
Related Work
Segment anything model
SAM in medical image segmentation
Polyp segmentation
Methodology
Prompts
Variable perturbed prompts for fine-tuning
Prompts during inference
SAM architecture
Image encoder
Prompt encoder
Mask decoder
Transfer learning
Limited data settings
...and 13 more sections

Figures (9)

Figure 1: Few-shot fine-tuning pipeline. Here, 'no perturbation' represents the bounding box extracted from the original ground truth (GT) masks; 'variable perturbations' means extending the bounding box on each side separately.
Figure 2: Transfer learning abilities of different modules of SAM on the Kvasir jha2020kvasir testset. As shown, freezing only the mask decoder produces the best results.
Figure 3: Comparison of different levels of bounding box perturbations during training, on the Kvasir testset. As shown, our variable prompt perturbation produces the overall best results.
Figure 4: Experimental results on Kvasir testset. All models are trained using the randomly sampled images from the Kvasir trainset. We use our variable perturbed bounding box (in the range of 0 to 50) during training. Also, we keep the mask decoder frozen during these experiments.
Figure 5: Experimental results of unseen ClinicDB testset. We utilize the models trained using the randomly sampled images from the Kvasir trainset for these experiments.
...and 4 more figures

PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation

TL;DR

Abstract

PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)