Table of Contents
Fetching ...

RevSAM2: Prompt SAM2 for Medical Image Segmentation via Reverse-Propagation without Fine-tuning

Yunhao Bai, Boxiang Yun, Zeli Chen, Qinji Yu, Yingda Xia, Yan Wang

TL;DR

RevSAM2 is introduced, a simple yet effective self-correction framework that enables SAM2 to achieve superior performance in unseen 3D medical image segmentation tasks without the need for fine-tuning, and is the first to explore the potential of SAM2 in label-efficient medical image segmentation without fine-tuning.

Abstract

The Segment Anything Model 2 (SAM2) has recently demonstrated exceptional performance in zero-shot prompt segmentation for natural images and videos. However, when the propagation mechanism of SAM2 is applied to medical images, it often results in spatial inconsistencies, leading to significantly different segmentation outcomes for very similar images. In this paper, we introduce RevSAM2, a simple yet effective self-correction framework that enables SAM2 to achieve superior performance in unseen 3D medical image segmentation tasks without the need for fine-tuning. Specifically, to segment a 3D query volume using a limited number of support image-label pairs that define a new segmentation task, we propose reverse propagation strategy as a query information selection mechanism. Instead of simply maintaining a first-in-first-out (FIFO) queue of memories to predict query slices sequentially, reverse propagation selects high-quality query information by leveraging support images to evaluate the quality of each predicted query slice mask. The selected high-quality masks are then used as prompts to propagate across the entire query volume, thereby enhancing generalization to unseen tasks. Notably, we are the first to explore the potential of SAM2 in label-efficient medical image segmentation without fine-tuning. Compared to fine-tuning on large labeled datasets, the label-efficient scenario provides a cost-effective alternative for medical segmentation tasks, particularly for rare diseases or when dealing with unseen classes. Experiments on four public datasets demonstrate the superiority of RevSAM2 in scenarios with limited labels, surpassing state-of-the-arts by 12.18% in Dice. The code will be released.

RevSAM2: Prompt SAM2 for Medical Image Segmentation via Reverse-Propagation without Fine-tuning

TL;DR

RevSAM2 is introduced, a simple yet effective self-correction framework that enables SAM2 to achieve superior performance in unseen 3D medical image segmentation tasks without the need for fine-tuning, and is the first to explore the potential of SAM2 in label-efficient medical image segmentation without fine-tuning.

Abstract

The Segment Anything Model 2 (SAM2) has recently demonstrated exceptional performance in zero-shot prompt segmentation for natural images and videos. However, when the propagation mechanism of SAM2 is applied to medical images, it often results in spatial inconsistencies, leading to significantly different segmentation outcomes for very similar images. In this paper, we introduce RevSAM2, a simple yet effective self-correction framework that enables SAM2 to achieve superior performance in unseen 3D medical image segmentation tasks without the need for fine-tuning. Specifically, to segment a 3D query volume using a limited number of support image-label pairs that define a new segmentation task, we propose reverse propagation strategy as a query information selection mechanism. Instead of simply maintaining a first-in-first-out (FIFO) queue of memories to predict query slices sequentially, reverse propagation selects high-quality query information by leveraging support images to evaluate the quality of each predicted query slice mask. The selected high-quality masks are then used as prompts to propagate across the entire query volume, thereby enhancing generalization to unseen tasks. Notably, we are the first to explore the potential of SAM2 in label-efficient medical image segmentation without fine-tuning. Compared to fine-tuning on large labeled datasets, the label-efficient scenario provides a cost-effective alternative for medical segmentation tasks, particularly for rare diseases or when dealing with unseen classes. Experiments on four public datasets demonstrate the superiority of RevSAM2 in scenarios with limited labels, surpassing state-of-the-arts by 12.18% in Dice. The code will be released.
Paper Structure (14 sections, 11 equations, 4 figures, 6 tables)

This paper contains 14 sections, 11 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Example of reverse propagation (left) and in feature space (right). $\textbf{S}$ is a CT slice and $\textbf{Y}$ is its segmentation mask, while $\textbf{q}_1$ and $\textbf{q}_2$ are two adjacent CT slices from a different CT scan than $\textbf{S}$. The pancreatic tail region in all three images is outlined in yellow. The prediction masks $\textbf{p}_1$ and $\textbf{p}_2$ correspond to $\textbf{q}_1$ and $\textbf{q}_2$, respectively, generated by the memory bank that stores features of $\textbf{S}$ and $\textbf{Y}$. Conversely, $\widetilde{\textbf{Y}}_1$( $\widetilde{\textbf{Y}}_2$ ) is the prediction masks for $\textbf{S}$ generated by the memory bank filled with the features of $\textbf{p}_1$ and $\textbf{q}_1$ ($\textbf{p}_2$ and $\textbf{q}_2$). In our framework, $\textbf{p}_1$ will be discarded, and $\textbf{q}_2$ along with $\textbf{p}_2$ are used to support the re-segmentation of $\textbf{q}_1$.
  • Figure 2: The overall framework of RevSAM2 (a) and illustration of forward propagation and reverse propagation (b). To evaluate the quality of the prediction $\textbf{p}_i$ obtained by forward propagating $\textbf{S}$ and $\textbf{Y}$ onto $\textbf{q}_i$, we reverse propagate $\textbf{q}_i$ and $\textbf{p}_i$ back to $\textbf{S}$ to obtain $\widetilde{\textbf{Y}}_i$, and calculate the average dice $\pi_i$ between $\widetilde{\textbf{Y}}_i$ and $\textbf{Y}$ and treat it as the as the metric to evaluate the accuracy of $\textbf{p}_i$.
  • Figure 3: The illustration of the query self propagation. In query self propagation, the memory bank continuously stores the features of conditional slices selected by reverse propagation, while maintaining a FIFO queue to store the features of non-conditional slices during internal query inference.
  • Figure 4: Line charts of $\pi$ (%) versus the actual Dice (%) of $\textbf{p}$ on the BTCV dataset when the number of supports is 10, 5, and 1.