Table of Contents
Fetching ...

Automating MedSAM by Learning Prompts with Weak Few-Shot Supervision

Mélanie Gaillochet, Christian Desrosiers, Hervé Lombaert

TL;DR

The paper tackles the prompt-dependency bottleneck of segmentation foundation models in medical imaging by introducing a lightweight prompt-learning module that derives a prompt embedding $Z_{pr}$ from the image embedding, enabling MedSAM to segment a specified region using only weak, bounding-box annotations and few-shot supervision. The approach preserves MedSAM's frozen backbone and optimizes a composite loss $\mathcal{L}_{total} = \mathcal{L}_{empty} + \lambda_1 \mathcal{L}_{tightbox} + \lambda_2 \mathcal{L}_{size}$ with a barrier-like formulation, using $w=5$ and $t=5$ to enforce tightness and size constraints. Empirical results on HC18, CAMUS, and ACDC demonstrate competitive performance in 10-shot settings and strong robustness to limited data, outperforming several fully supervised baselines in some tasks while offering substantial annotation and computation savings. The work provides a practical path to automating prompt selection in medical segmentation, with code released for reproducibility and adaptation to new domains.

Abstract

Foundation models such as the recently introduced Segment Anything Model (SAM) have achieved remarkable results in image segmentation tasks. However, these models typically require user interaction through handcrafted prompts such as bounding boxes, which limits their deployment to downstream tasks. Adapting these models to a specific task with fully labeled data also demands expensive prior user interaction to obtain ground-truth annotations. This work proposes to replace conditioning on input prompts with a lightweight module that directly learns a prompt embedding from the image embedding, both of which are subsequently used by the foundation model to output a segmentation mask. Our foundation models with learnable prompts can automatically segment any specific region by 1) modifying the input through a prompt embedding predicted by a simple module, and 2) using weak labels (tight bounding boxes) and few-shot supervision (10 samples). Our approach is validated on MedSAM, a version of SAM fine-tuned for medical images, with results on three medical datasets in MR and ultrasound imaging. Our code is available on https://github.com/Minimel/MedSAMWeakFewShotPromptAutomation.

Automating MedSAM by Learning Prompts with Weak Few-Shot Supervision

TL;DR

The paper tackles the prompt-dependency bottleneck of segmentation foundation models in medical imaging by introducing a lightweight prompt-learning module that derives a prompt embedding from the image embedding, enabling MedSAM to segment a specified region using only weak, bounding-box annotations and few-shot supervision. The approach preserves MedSAM's frozen backbone and optimizes a composite loss with a barrier-like formulation, using and to enforce tightness and size constraints. Empirical results on HC18, CAMUS, and ACDC demonstrate competitive performance in 10-shot settings and strong robustness to limited data, outperforming several fully supervised baselines in some tasks while offering substantial annotation and computation savings. The work provides a practical path to automating prompt selection in medical segmentation, with code released for reproducibility and adaptation to new domains.

Abstract

Foundation models such as the recently introduced Segment Anything Model (SAM) have achieved remarkable results in image segmentation tasks. However, these models typically require user interaction through handcrafted prompts such as bounding boxes, which limits their deployment to downstream tasks. Adapting these models to a specific task with fully labeled data also demands expensive prior user interaction to obtain ground-truth annotations. This work proposes to replace conditioning on input prompts with a lightweight module that directly learns a prompt embedding from the image embedding, both of which are subsequently used by the foundation model to output a segmentation mask. Our foundation models with learnable prompts can automatically segment any specific region by 1) modifying the input through a prompt embedding predicted by a simple module, and 2) using weak labels (tight bounding boxes) and few-shot supervision (10 samples). Our approach is validated on MedSAM, a version of SAM fine-tuned for medical images, with results on three medical datasets in MR and ultrasound imaging. Our code is available on https://github.com/Minimel/MedSAMWeakFewShotPromptAutomation.
Paper Structure (11 sections, 7 equations, 2 figures, 1 table)

This paper contains 11 sections, 7 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Comparison between (\ref{['subfig:sam']}) MedSAM and (\ref{['subfig:method']}) our automation of MedSAM via a learnt prompt module. Our prompt module replaces MedSAM's prompt encoder and learns to generate a relevant prompt embedding from the image embedding. Training employs losses that utilize only tight box labels.
  • Figure 2: Predicted segmentations on test samples of HC18 (row 1) and the right ventricle in ACDC (row 2). From left to right, (a) MedSAM prompted with a tight box, (b-d) UNet, TransUNet and AutoSAM, trained with ground-truth masks, (e) PerSAM using one reference image with its ground-truth mask, and (f) our method trained on tight bounding boxes. All automatic methods are given for the 10-shot setting, except PerSAM, a 1-shot approach. Ground-truth annotation is drawn in red, with predicted segmentation mask overlayed in yellow.