Table of Contents
Fetching ...

Motion-Boundary-Driven Unsupervised Surgical Instrument Segmentation in Low-Quality Optical Flow

Yang Liu, Peiran Wu, Jiayu Huo, Gongyu Zhang, Zhen Yuan, Christos Bergeles, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

TL;DR

The paper addresses unsupervised surgical instrument segmentation in endoscopic videos, where low-quality optical flow hinders motion-based supervision. It introduces a motion-boundary-driven framework comprising High-Quality Area Matching (HQAM) to focus on reliable motion boundaries, Low-Quality Cases Dropping (LQCD) to discard globally weak-flow frames, and a variable frame-rate training scheme to capture subtle instrument motions, all built on a RAFT-informed backbone (RCF). The combined approach yields substantial $mIoU$ gains on the EndoVis 2017 VOS and Challenge datasets (about $0.750$ and $0.720$, respectively), outperforming prior unsupervised methods and improving over baselines by large margins. This plug-and-play framework reduces dependence on manual annotations, enabling scalable, annotation-free surgical instrument segmentation with potential extensions to other motion-driven tasks such as unsupervised depth estimation in clinical settings.

Abstract

Unsupervised video-based surgical instrument segmentation has the potential to accelerate the adoption of robot-assisted procedures by reducing the reliance on manual annotations. However, the generally low quality of optical flow in endoscopic footage poses a great challenge for unsupervised methods that rely heavily on motion cues. To overcome this limitation, we propose a novel approach that pinpoints motion boundaries, regions with abrupt flow changes, while selectively discarding frames with globally low-quality flow and adapting to varying motion patterns. Experiments on the EndoVis2017 VOS and EndoVis2017 Challenge datasets show that our method achieves mean Intersection-over-Union (mIoU) scores of 0.75 and 0.72, respectively, effectively alleviating the constraints imposed by suboptimal optical flow. This enables a more scalable and robust surgical instrument segmentation solution in clinical settings. The code will be publicly released.

Motion-Boundary-Driven Unsupervised Surgical Instrument Segmentation in Low-Quality Optical Flow

TL;DR

The paper addresses unsupervised surgical instrument segmentation in endoscopic videos, where low-quality optical flow hinders motion-based supervision. It introduces a motion-boundary-driven framework comprising High-Quality Area Matching (HQAM) to focus on reliable motion boundaries, Low-Quality Cases Dropping (LQCD) to discard globally weak-flow frames, and a variable frame-rate training scheme to capture subtle instrument motions, all built on a RAFT-informed backbone (RCF). The combined approach yields substantial gains on the EndoVis 2017 VOS and Challenge datasets (about and , respectively), outperforming prior unsupervised methods and improving over baselines by large margins. This plug-and-play framework reduces dependence on manual annotations, enabling scalable, annotation-free surgical instrument segmentation with potential extensions to other motion-driven tasks such as unsupervised depth estimation in clinical settings.

Abstract

Unsupervised video-based surgical instrument segmentation has the potential to accelerate the adoption of robot-assisted procedures by reducing the reliance on manual annotations. However, the generally low quality of optical flow in endoscopic footage poses a great challenge for unsupervised methods that rely heavily on motion cues. To overcome this limitation, we propose a novel approach that pinpoints motion boundaries, regions with abrupt flow changes, while selectively discarding frames with globally low-quality flow and adapting to varying motion patterns. Experiments on the EndoVis2017 VOS and EndoVis2017 Challenge datasets show that our method achieves mean Intersection-over-Union (mIoU) scores of 0.75 and 0.72, respectively, effectively alleviating the constraints imposed by suboptimal optical flow. This enables a more scalable and robust surgical instrument segmentation solution in clinical settings. The code will be publicly released.
Paper Structure (11 sections, 4 equations, 4 figures, 3 tables)

This paper contains 11 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Example of some low-quality optical flow frames, including stationary instruments, dark areas and abrupt movements, which greatly limit the model performance.
  • Figure 2: Overview of our proposed unsupervised instrument segmentation framework. Two frames, separated by a random interval $r$, are fed into both a motion-guided segmentation model (e.g. RCF Lian_2023_CVPR) and a pre-trained Motion Estimator ( e.g. RAFT teed2020raft) that generates pseudo flow maps $o_t$. The proposed HQAM and LQCD modules refine these pseudo flow maps, yielding a robust supervision.
  • Figure 3: Illustration of our HQAM and LQCD modules. HQAM derives a boundary-based mask from pseudo optical flows $o_t$, isolating reliable high-quality regions to guide segmentation. Meanwhile, LQCD ranks each frame in a batch by its per-frame loss and discards the top $h$ "hard cases", removing globally low-quality motion signals.
  • Figure 4: Qualitative comparisons with the baseline model RCF, showing (a) optical flow pseudo-labels obtained by RAFT prediction (b) Ground Truth from EndoVis 2017, offering (c) Prediction masks of our method (d) Prediction masks of RCF.