Automating MedSAM by Learning Prompts with Weak Few-Shot Supervision
Mélanie Gaillochet, Christian Desrosiers, Hervé Lombaert
TL;DR
The paper tackles the prompt-dependency bottleneck of segmentation foundation models in medical imaging by introducing a lightweight prompt-learning module that derives a prompt embedding $Z_{pr}$ from the image embedding, enabling MedSAM to segment a specified region using only weak, bounding-box annotations and few-shot supervision. The approach preserves MedSAM's frozen backbone and optimizes a composite loss $\mathcal{L}_{total} = \mathcal{L}_{empty} + \lambda_1 \mathcal{L}_{tightbox} + \lambda_2 \mathcal{L}_{size}$ with a barrier-like formulation, using $w=5$ and $t=5$ to enforce tightness and size constraints. Empirical results on HC18, CAMUS, and ACDC demonstrate competitive performance in 10-shot settings and strong robustness to limited data, outperforming several fully supervised baselines in some tasks while offering substantial annotation and computation savings. The work provides a practical path to automating prompt selection in medical segmentation, with code released for reproducibility and adaptation to new domains.
Abstract
Foundation models such as the recently introduced Segment Anything Model (SAM) have achieved remarkable results in image segmentation tasks. However, these models typically require user interaction through handcrafted prompts such as bounding boxes, which limits their deployment to downstream tasks. Adapting these models to a specific task with fully labeled data also demands expensive prior user interaction to obtain ground-truth annotations. This work proposes to replace conditioning on input prompts with a lightweight module that directly learns a prompt embedding from the image embedding, both of which are subsequently used by the foundation model to output a segmentation mask. Our foundation models with learnable prompts can automatically segment any specific region by 1) modifying the input through a prompt embedding predicted by a simple module, and 2) using weak labels (tight bounding boxes) and few-shot supervision (10 samples). Our approach is validated on MedSAM, a version of SAM fine-tuned for medical images, with results on three medical datasets in MR and ultrasound imaging. Our code is available on https://github.com/Minimel/MedSAMWeakFewShotPromptAutomation.
