From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments

Kanyifeechukwu J. Oguine; Roger D. Soberanis-Mukul; Nathan Drenkow; Mathias Unberath

From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments

Kanyifeechukwu J. Oguine, Roger D. Soberanis-Mukul, Nathan Drenkow, Mathias Unberath

TL;DR

This work interrogates the zero-shot generalization of the Segment Anything Model (SAM) for surgical tool segmentation in endoscopic imagery, addressing data scarcity and artifact-induced degradation. It compares a single best SAM mask against aggregating all overlapping sub-masks that intersect the tool ROI, using a threshold $i_r > 0.5$ and the combined mask $M_{comb}$ defined by $M_{comb} = M_1 + M_2 + \cdots + M_n$. Evaluations on EndoVis17, EndoVis18, and an In-House dataset with synthetic and real corruptions show that combining masks improves IoU over single-mask predictions, especially under challenging perturbations like zoom blur, while single masks remain competitive on clean images. The results underscore the importance of prompt design for medical deployment of foundational segmentation models and show that SAM’s tendency to over-segment can be mitigated through strategy-driven aggregation of overlapping masks.

Abstract

Purpose: Accurate tool segmentation is essential in computer-aided procedures. However, this task conveys challenges due to artifacts' presence and the limited training data in medical scenarios. Methods that generalize to unseen data represent an interesting venue, where zero-shot segmentation presents an option to account for data limitation. Initial exploratory works with the Segment Anything Model (SAM) show that bounding-box-based prompting presents notable zero-short generalization. However, point-based prompting leads to a degraded performance that further deteriorates under image corruption. We argue that SAM drastically over-segment images with high corruption levels, resulting in degraded performance when only a single segmentation mask is considered, while the combination of the masks overlapping the object of interest generates an accurate prediction. Method: We use SAM to generate the over-segmented prediction of endoscopic frames. Then, we employ the ground-truth tool mask to analyze the results of SAM when the best single mask is selected as prediction and when all the individual masks overlapping the object of interest are combined to obtain the final predicted mask. We analyze the Endovis18 and Endovis17 instrument segmentation datasets using synthetic corruptions of various strengths and an In-House dataset featuring counterfactually created real-world corruptions. Results: Combining the over-segmented masks contributes to improvements in the IoU. Furthermore, selecting the best single segmentation presents a competitive IoU score for clean images. Conclusions: Combined SAM predictions present improved results and robustness up to a certain corruption level. However, appropriate prompting strategies are fundamental for implementing these models in the medical domain.

From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments

TL;DR

and the combined mask

defined by

. Evaluations on EndoVis17, EndoVis18, and an In-House dataset with synthetic and real corruptions show that combining masks improves IoU over single-mask predictions, especially under challenging perturbations like zoom blur, while single masks remain competitive on clean images. The results underscore the importance of prompt design for medical deployment of foundational segmentation models and show that SAM’s tendency to over-segment can be mitigated through strategy-driven aggregation of overlapping masks.

Abstract

Paper Structure (4 sections, 3 equations, 7 figures, 1 table)

This paper contains 4 sections, 3 equations, 7 figures, 1 table.

Introduction
Method
Results
Conclusions

Figures (7)

Figure 1: Examples of five types of corruptions applied to the images with their corresponding severity level. Images from our In-House dataset.
Figure 2: Overlay of (a) single and (b) combine prediction masks for different corruption types. True positives, false positives, and false negatives are indicated in green, red, and blue, respectively. Samples of the three datasets are presented.
Figure 3: Average IoU for the 18 types of corruption, with five levels of severity for the Single v.s. Combined SAM segmentation mask in the In-House dataset.
Figure 4: Average IoU for the 18 types of corruption, with five levels of severity for the Single v.s. Combined SAM segmentation mask in the EndoVis17 dataset.
Figure 5: Average IoU for the 18 types of corruption, with five levels of severity for the Single v.s. Combined SAM segmentation mask in the EndoVis18 dataset.
...and 2 more figures

From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments

TL;DR

Abstract

From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (7)