ProtoSAM: One-Shot Medical Image Segmentation With Foundational Models
Lev Ayzenberg, Raja Giryes, Hayit Greenspan
TL;DR
This work addresses the challenge of label-efficient medical image segmentation by introducing ProtoSAM, a one-shot framework that combines prototypical networks with the Segment Anything Model. A DINOv2-encoder-based ALPNet creates an initial coarse segmentation from a single support image, from which prompts (points and bounding boxes) are extracted to guide SAM for refined, one-shot segmentation. The approach achieves strong results across abdominal CT/MRI and polyp datasets, often outperforming supervised and other foundation-model baselines, and can be further boosted via encoder fine-tuning (EFT). ProtoSAM demonstrates the potential of leveraging large-scale foundation models for rapid, label-efficient medical image analysis, with future work focusing on improved prompt generation and broader task applicability.
Abstract
This work introduces a new framework, ProtoSAM, for one-shot medical image segmentation. It combines the use of prototypical networks, known for few-shot segmentation, with SAM - a natural image foundation model. The method proposed creates an initial coarse segmentation mask using the ALPnet prototypical network, augmented with a DINOv2 encoder. Following the extraction of an initial mask, prompts are extracted, such as points and bounding boxes, which are then input into the Segment Anything Model (SAM). State-of-the-art results are shown on several medical image datasets and demonstrate automated segmentation capabilities using a single image example (one shot) with no need for fine-tuning of the foundation model. Our code is available at: https://github.com/levayz/ProtoSAM
