Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Haofeng Liu; Erli Zhang; Junde Wu; Mingxuan Hong; Yueming Jin

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Haofeng Liu, Erli Zhang, Junde Wu, Mingxuan Hong, Yueming Jin

TL;DR

The paper addresses real-time surgical video segmentation by mitigating SAM2’s computational burden. It introduces SurgSAM2, which couples SAM2 with an Efficient Frame Pruning mechanism that uses cosine similarity to maintain a compact dynamic memory bank of informative frames. The work demonstrates that SurgSAM2 delivers significant FPS improvements with competitive or improved segmentation accuracy on EndoVis17 and EndoVis18, particularly after fine-tuning with lower-resolution data. This memory-efficient approach enables real-time, instance-level instrument segmentation in resource-constrained surgical environments and suggests broader applicability to surgical video analysis.

Abstract

Surgical video segmentation is a critical task in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has shown superior advancements in image and video segmentation. However, SAM2 struggles with efficiency due to the high computational demands of processing high-resolution images and complex and long-range temporal dynamics in surgical videos. To address these challenges, we introduce Surgical SAM 2 (SurgSAM2), an advanced model to utilize SAM2 with an Efficient Frame Pruning (EFP) mechanism, to facilitate real-time surgical video segmentation. The EFP mechanism dynamically manages the memory bank by selectively retaining only the most informative frames, reducing memory usage and computational cost while maintaining high segmentation accuracy. Our extensive experiments demonstrate that SurgSAM2 significantly improves both efficiency and segmentation accuracy compared to the vanilla SAM2. Remarkably, SurgSAM2 achieves a 3$\times$ FPS compared with SAM2, while also delivering state-of-the-art performance after fine-tuning with lower-resolution data. These advancements establish SurgSAM2 as a leading model for surgical video analysis, making real-time surgical video segmentation in resource-constrained environments a reality. Our source code is available at https://github.com/jinlab-imvr/Surgical-SAM-2.

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

TL;DR

Abstract

FPS compared with SAM2, while also delivering state-of-the-art performance after fine-tuning with lower-resolution data. These advancements establish SurgSAM2 as a leading model for surgical video analysis, making real-time surgical video segmentation in resource-constrained environments a reality. Our source code is available at https://github.com/jinlab-imvr/Surgical-SAM-2.

Paper Structure (19 sections, 1 equation, 2 figures, 4 tables)

This paper contains 19 sections, 1 equation, 2 figures, 4 tables.

Introduction
Related work
Surgical Instrument Segmentation
Segment Anything Model 2
Memory Bank Restriction
Methods
SurgSAM2 Architecture
Efficient Frame Pruning
Implementation Details
Experiment
Dataset
Evaluation Metrics
Experimental Results
Evaluation on Model Efficiency
Evaluation on Model Accuracy
...and 4 more sections

Figures (2)

Figure 1: Architecture of the proposed model SurgSAM2.
Figure 2: Visual comparison between SAM2 and SurgSAM2 on EndoVis18 dataset.

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

TL;DR

Abstract

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)