Motion-Boundary-Driven Unsupervised Surgical Instrument Segmentation in Low-Quality Optical Flow
Yang Liu, Peiran Wu, Jiayu Huo, Gongyu Zhang, Zhen Yuan, Christos Bergeles, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin
TL;DR
The paper addresses unsupervised surgical instrument segmentation in endoscopic videos, where low-quality optical flow hinders motion-based supervision. It introduces a motion-boundary-driven framework comprising High-Quality Area Matching (HQAM) to focus on reliable motion boundaries, Low-Quality Cases Dropping (LQCD) to discard globally weak-flow frames, and a variable frame-rate training scheme to capture subtle instrument motions, all built on a RAFT-informed backbone (RCF). The combined approach yields substantial $mIoU$ gains on the EndoVis 2017 VOS and Challenge datasets (about $0.750$ and $0.720$, respectively), outperforming prior unsupervised methods and improving over baselines by large margins. This plug-and-play framework reduces dependence on manual annotations, enabling scalable, annotation-free surgical instrument segmentation with potential extensions to other motion-driven tasks such as unsupervised depth estimation in clinical settings.
Abstract
Unsupervised video-based surgical instrument segmentation has the potential to accelerate the adoption of robot-assisted procedures by reducing the reliance on manual annotations. However, the generally low quality of optical flow in endoscopic footage poses a great challenge for unsupervised methods that rely heavily on motion cues. To overcome this limitation, we propose a novel approach that pinpoints motion boundaries, regions with abrupt flow changes, while selectively discarding frames with globally low-quality flow and adapting to varying motion patterns. Experiments on the EndoVis2017 VOS and EndoVis2017 Challenge datasets show that our method achieves mean Intersection-over-Union (mIoU) scores of 0.75 and 0.72, respectively, effectively alleviating the constraints imposed by suboptimal optical flow. This enables a more scalable and robust surgical instrument segmentation solution in clinical settings. The code will be publicly released.
