Memory-Enhanced SAM3 for Occlusion-Robust Surgical Instrument Segmentation
Valay Bundele, Mehran Hosseinzadeh, Hendrik P. A. Lensch
TL;DR
This work tackles occlusion and long-term identity preservation in surgical instrument segmentation by extending SAM3 with a training-free ReMeDI framework. It introduces a dual-memory design (relevance-aware and occlusion-aware), a memory expansion via piecewise temporal-encoding interpolation, and a feature-based re-identification module with temporal voting to correct identities after disocclusions. The approach yields substantial zero-shot improvements on EndoVis17 and EndoVis18, notably increasing mcIoU and reducing false positives, without retraining. These advances enhance robustness and reliability of instrument tracking in challenging endoscopic videos, supporting better intraoperative guidance without domain-specific fine-tuning.
Abstract
Accurate surgical instrument segmentation in endoscopic videos is crucial for computer-assisted interventions, yet remains challenging due to frequent occlusions, rapid motion, specular artefacts, and long-term instrument re-entry. While SAM3 provides a powerful spatio-temporal framework for video object segmentation, its performance in surgical scenes is limited by indiscriminate memory updates, fixed memory capacity, and weak identity recovery after occlusions. We propose ReMeDI-SAM3, a training-free memory-enhanced extension of SAM3, that addresses these limitations through three components: (i) relevance-aware memory filtering with a dedicated occlusion-aware memory for storing pre-occlusion frames, (ii) a piecewise interpolation scheme that expands the effective memory capacity, and (iii) a feature-based re-identification module with temporal voting for reliable post-occlusion identity disambiguation. Together, these components mitigate error accumulation and enable reliable recovery after occlusions. Evaluations on EndoVis17 and EndoVis18 under a zero-shot setting show absolute mcIoU improvements of around 7% and 16%, respectively, over vanilla SAM3, outperforming even prior training-based approaches. Project page: https://valaybundele.github.io/remedi-sam3/.
