Table of Contents
Fetching ...

ProtoSAM: One-Shot Medical Image Segmentation With Foundational Models

Lev Ayzenberg, Raja Giryes, Hayit Greenspan

TL;DR

This work addresses the challenge of label-efficient medical image segmentation by introducing ProtoSAM, a one-shot framework that combines prototypical networks with the Segment Anything Model. A DINOv2-encoder-based ALPNet creates an initial coarse segmentation from a single support image, from which prompts (points and bounding boxes) are extracted to guide SAM for refined, one-shot segmentation. The approach achieves strong results across abdominal CT/MRI and polyp datasets, often outperforming supervised and other foundation-model baselines, and can be further boosted via encoder fine-tuning (EFT). ProtoSAM demonstrates the potential of leveraging large-scale foundation models for rapid, label-efficient medical image analysis, with future work focusing on improved prompt generation and broader task applicability.

Abstract

This work introduces a new framework, ProtoSAM, for one-shot medical image segmentation. It combines the use of prototypical networks, known for few-shot segmentation, with SAM - a natural image foundation model. The method proposed creates an initial coarse segmentation mask using the ALPnet prototypical network, augmented with a DINOv2 encoder. Following the extraction of an initial mask, prompts are extracted, such as points and bounding boxes, which are then input into the Segment Anything Model (SAM). State-of-the-art results are shown on several medical image datasets and demonstrate automated segmentation capabilities using a single image example (one shot) with no need for fine-tuning of the foundation model. Our code is available at: https://github.com/levayz/ProtoSAM

ProtoSAM: One-Shot Medical Image Segmentation With Foundational Models

TL;DR

This work addresses the challenge of label-efficient medical image segmentation by introducing ProtoSAM, a one-shot framework that combines prototypical networks with the Segment Anything Model. A DINOv2-encoder-based ALPNet creates an initial coarse segmentation from a single support image, from which prompts (points and bounding boxes) are extracted to guide SAM for refined, one-shot segmentation. The approach achieves strong results across abdominal CT/MRI and polyp datasets, often outperforming supervised and other foundation-model baselines, and can be further boosted via encoder fine-tuning (EFT). ProtoSAM demonstrates the potential of leveraging large-scale foundation models for rapid, label-efficient medical image analysis, with future work focusing on improved prompt generation and broader task applicability.

Abstract

This work introduces a new framework, ProtoSAM, for one-shot medical image segmentation. It combines the use of prototypical networks, known for few-shot segmentation, with SAM - a natural image foundation model. The method proposed creates an initial coarse segmentation mask using the ALPnet prototypical network, augmented with a DINOv2 encoder. Following the extraction of an initial mask, prompts are extracted, such as points and bounding boxes, which are then input into the Segment Anything Model (SAM). State-of-the-art results are shown on several medical image datasets and demonstrate automated segmentation capabilities using a single image example (one shot) with no need for fine-tuning of the foundation model. Our code is available at: https://github.com/levayz/ProtoSAM
Paper Structure (8 sections, 10 equations, 2 figures, 4 tables)

This paper contains 8 sections, 10 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: ProtoSAM Framework: A DINOv2 encoder derives features from query and support images. Foreground and background prototypes are crafted from support features and masked through the ALP module. Initial segmentation is achieved by comparing these prototypes with query features using cosine similarity. The system extracts prompts from an initial prediction to guide the SAM model for enhanced segmentation.
  • Figure 2: (Left) MRI segmentation. Top to bottom: SSL-Dinov2 + CCA, SAM (best mask), ProtoSAM, Ground Truth, Query Image. Left to right: RK, LK, Spleen, Liver. (Right) Polyp segmentation. Top to bottom: SAM (best mask), ProtoMedSAM, ProtoSAM, Ground Truth, Query Image.