Table of Contents
Fetching ...

Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching

Haiyue Zu, Jun Ge, Heting Xiao, Jile Xie, Zhangzhe Zhou, Yifan Meng, Jiayi Ni, Junjie Niu, Linlin Zhang, Li Ni, Huilin Yang

TL;DR

This paper addresses the high data and labeling costs of medical image segmentation by introducing a training-free FSMIS framework that exploits SAM2's video segmentation. By treating 3D volumes as video sequences and performing per-slice support-query matching over an augmented support set, the method prompts SAM2 with the most perceptually similar support image and its mask to segment each slice without any model updates. It introduces a three-stage pipeline (Support Set Construction, Support-Query Matching, Prompt-Driven Segmentation) and demonstrates state-of-the-art Dice scores on Synapse-CT, CHAOS-MRI, and CMR datasets, with notable gains in annotation efficiency. The approach offers a general, plug-and-play strategy for 3D medical image segmentation that can extend to other video segmentation models and reduce the dependencies on large labeled datasets.

Abstract

The reliance on large labeled datasets presents a significant challenge in medical image segmentation. Few-shot learning offers a potential solution, but existing methods often still require substantial training data. This paper proposes a novel approach that leverages the Segment Anything Model 2 (SAM2), a vision foundation model with strong video segmentation capabilities. We conceptualize 3D medical image volumes as video sequences, departing from the traditional slice-by-slice paradigm. Our core innovation is a support-query matching strategy: we perform extensive data augmentation on a single labeled support image and, for each frame in the query volume, algorithmically select the most analogous augmented support image. This selected image, along with its corresponding mask, is used as a mask prompt, driving SAM2's video segmentation. This approach entirely avoids model retraining or parameter updates. We demonstrate state-of-the-art performance on benchmark few-shot medical image segmentation datasets, achieving significant improvements in accuracy and annotation efficiency. This plug-and-play method offers a powerful and generalizable solution for 3D medical image segmentation.

Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching

TL;DR

This paper addresses the high data and labeling costs of medical image segmentation by introducing a training-free FSMIS framework that exploits SAM2's video segmentation. By treating 3D volumes as video sequences and performing per-slice support-query matching over an augmented support set, the method prompts SAM2 with the most perceptually similar support image and its mask to segment each slice without any model updates. It introduces a three-stage pipeline (Support Set Construction, Support-Query Matching, Prompt-Driven Segmentation) and demonstrates state-of-the-art Dice scores on Synapse-CT, CHAOS-MRI, and CMR datasets, with notable gains in annotation efficiency. The approach offers a general, plug-and-play strategy for 3D medical image segmentation that can extend to other video segmentation models and reduce the dependencies on large labeled datasets.

Abstract

The reliance on large labeled datasets presents a significant challenge in medical image segmentation. Few-shot learning offers a potential solution, but existing methods often still require substantial training data. This paper proposes a novel approach that leverages the Segment Anything Model 2 (SAM2), a vision foundation model with strong video segmentation capabilities. We conceptualize 3D medical image volumes as video sequences, departing from the traditional slice-by-slice paradigm. Our core innovation is a support-query matching strategy: we perform extensive data augmentation on a single labeled support image and, for each frame in the query volume, algorithmically select the most analogous augmented support image. This selected image, along with its corresponding mask, is used as a mask prompt, driving SAM2's video segmentation. This approach entirely avoids model retraining or parameter updates. We demonstrate state-of-the-art performance on benchmark few-shot medical image segmentation datasets, achieving significant improvements in accuracy and annotation efficiency. This plug-and-play method offers a powerful and generalizable solution for 3D medical image segmentation.

Paper Structure

This paper contains 25 sections, 12 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Visual comparison between conventional FSMIS methodologies and our novel paradigm. The former necessitates substantial meta-training using large annotated medical image repositories, whereas our framework capitalizes on the video segmentation prowess of SAM2 to eliminate training requirements, thereby significantly mitigating data labeling and computational training expenditures.
  • Figure 2: Overview of our proposed framework. The process involves: (1) Constructing a diverse support set through comprehensive augmentation of the single labeled support image; (2) Performing per-slice Support-Query Matching to identify the most perceptually similar augmented support image; (3) Utilizing SAM2 for Prompt-Driven Segmentation by treating the 3D query volume as a video sequence, with matched support images serving as mask prompts. This approach leverages SAM2's video segmentation capabilities without requiring any retraining.
  • Figure 3: Schematic representation of the support set construction pipeline. This process involves the sequential application of affine transformations to both the support image and its corresponding mask, followed by color jittering applied solely to the image, thereby creating a diverse set of augmented support pairs.
  • Figure 4: Illustration of the support-query matching process. For each query image $I^q_j$, LPIPS is computed against all support images within the enhanced support set $\mathcal{S}'$, and the support image ${{{I'}^s_{i^*(j)}}}$ yielding the lowest LPIPS is identified as the best match.
  • Figure 5: Illustration of the prompt-driven segmentation with SAM2. For each query slice, a two-frame input sequence is constructed, comprising the current query slice and the best-matching support image. The segmentation mask of the support image serves as a prompt for SAM2, enabling the propagation of segmentation to the query slice.
  • ...and 4 more figures