Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching

Haiyue Zu; Jun Ge; Heting Xiao; Jile Xie; Zhangzhe Zhou; Yifan Meng; Jiayi Ni; Junjie Niu; Linlin Zhang; Li Ni; Huilin Yang

Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching

Haiyue Zu, Jun Ge, Heting Xiao, Jile Xie, Zhangzhe Zhou, Yifan Meng, Jiayi Ni, Junjie Niu, Linlin Zhang, Li Ni, Huilin Yang

TL;DR

This paper addresses the high data and labeling costs of medical image segmentation by introducing a training-free FSMIS framework that exploits SAM2's video segmentation. By treating 3D volumes as video sequences and performing per-slice support-query matching over an augmented support set, the method prompts SAM2 with the most perceptually similar support image and its mask to segment each slice without any model updates. It introduces a three-stage pipeline (Support Set Construction, Support-Query Matching, Prompt-Driven Segmentation) and demonstrates state-of-the-art Dice scores on Synapse-CT, CHAOS-MRI, and CMR datasets, with notable gains in annotation efficiency. The approach offers a general, plug-and-play strategy for 3D medical image segmentation that can extend to other video segmentation models and reduce the dependencies on large labeled datasets.

Abstract

The reliance on large labeled datasets presents a significant challenge in medical image segmentation. Few-shot learning offers a potential solution, but existing methods often still require substantial training data. This paper proposes a novel approach that leverages the Segment Anything Model 2 (SAM2), a vision foundation model with strong video segmentation capabilities. We conceptualize 3D medical image volumes as video sequences, departing from the traditional slice-by-slice paradigm. Our core innovation is a support-query matching strategy: we perform extensive data augmentation on a single labeled support image and, for each frame in the query volume, algorithmically select the most analogous augmented support image. This selected image, along with its corresponding mask, is used as a mask prompt, driving SAM2's video segmentation. This approach entirely avoids model retraining or parameter updates. We demonstrate state-of-the-art performance on benchmark few-shot medical image segmentation datasets, achieving significant improvements in accuracy and annotation efficiency. This plug-and-play method offers a powerful and generalizable solution for 3D medical image segmentation.

Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching

TL;DR

Abstract

Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)