When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos

Woowon Jang; Jiwon Im; Juseung Choi; Niki Rashidian; Wesley De Neve; Utku Ozbulak

When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos

Woowon Jang, Jiwon Im, Juseung Choi, Niki Rashidian, Wesley De Neve, Utku Ozbulak

TL;DR

The paper investigates the reliability of point-based tracking using SAM2 in surgical videos, with a focus on laparoscopic cholecystectomy. It systematically compares point-based initialization to segmentation-mask initialization across three targets (gallbladder, grasper, L-hook) using three point-placement strategies and multiple point counts on a CholecSeg8k subset. Results show that anatomy, particularly the gallbladder, suffers from boundary ambiguity and tissue similarity, while surgical tools are tracked more effectively with points; increasing the number of points helps but does not fully bridge the gap for anatomical targets. The study offers concrete recommendations for point placement and points toward future work with negative points to enhance robustness in complex surgical scenes.

Abstract

Video object segmentation (VOS) models such as SAM2 offer promising zero-shot tracking capabilities for surgical videos using minimal user input. Among the available input types, point-based tracking offers an efficient and low-cost alternative, yet its reliability and failure cases in complex surgical environments are not well understood. In this work, we systematically analyze the failure modes of point-based tracking in laparoscopic cholecystectomy videos. Focusing on three surgical targets, the gallbladder, grasper, and L-hook electrocautery, we compare the performance of point-based tracking with segmentation mask initialization. Our results show that point-based tracking is competitive for surgical tools but consistently underperforms for anatomical targets, where tissue similarity and ambiguous boundaries lead to failure. Through qualitative analysis, we reveal key factors influencing tracking outcomes and provide several actionable recommendations for selecting and placing tracking points to improve performance in surgical video analysis.

When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos

TL;DR

Abstract

When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)