Table of Contents
Fetching ...

AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation

Mingyu Sheng, Jianan Fan, Dongnan Liu, Ron Kikinis, Weidong Cai

TL;DR

This work proposes a label-free unsupervised model featuring a novel module named Multi-View Normalized Cutter (m-NCutter), trained using a graph-cutting loss function that leverages patch affini-ties for supervision, eliminating the need for pseudo-labels.

Abstract

Surgical instrument segmentation (SIS) is pivotal for robotic-assisted minimally invasive surgery, assisting surgeons by identifying surgical instruments in endoscopic video frames. Recent unsupervised surgical instrument segmentation (USIS) methods primarily rely on pseudo-labels derived from low-level features such as color and optical flow, but these methods show limited effectiveness and generalizability in complex and unseen endoscopic scenarios. In this work, we propose a label-free unsupervised model featuring a novel module named Multi-View Normalized Cutter (m-NCutter). Different from previous USIS works, our model is trained using a graph-cutting loss function that leverages patch affinities for supervision, eliminating the need for pseudo-labels. The framework adaptively determines which affinities from which levels should be prioritized. Therefore, the low- and high-level features and their affinities are effectively integrated to train a label-free unsupervised model, showing superior effectiveness and generalization ability. We conduct comprehensive experiments across multiple SIS datasets to validate our approach's state-of-the-art (SOTA) performance, robustness, and exceptional potential as a pre-trained model. Our code is released at https://github.com/MingyuShengSMY/AMNCutter.

AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation

TL;DR

This work proposes a label-free unsupervised model featuring a novel module named Multi-View Normalized Cutter (m-NCutter), trained using a graph-cutting loss function that leverages patch affini-ties for supervision, eliminating the need for pseudo-labels.

Abstract

Surgical instrument segmentation (SIS) is pivotal for robotic-assisted minimally invasive surgery, assisting surgeons by identifying surgical instruments in endoscopic video frames. Recent unsupervised surgical instrument segmentation (USIS) methods primarily rely on pseudo-labels derived from low-level features such as color and optical flow, but these methods show limited effectiveness and generalizability in complex and unseen endoscopic scenarios. In this work, we propose a label-free unsupervised model featuring a novel module named Multi-View Normalized Cutter (m-NCutter). Different from previous USIS works, our model is trained using a graph-cutting loss function that leverages patch affinities for supervision, eliminating the need for pseudo-labels. The framework adaptively determines which affinities from which levels should be prioritized. Therefore, the low- and high-level features and their affinities are effectively integrated to train a label-free unsupervised model, showing superior effectiveness and generalization ability. We conduct comprehensive experiments across multiple SIS datasets to validate our approach's state-of-the-art (SOTA) performance, robustness, and exceptional potential as a pre-trained model. Our code is released at https://github.com/MingyuShengSMY/AMNCutter.

Paper Structure

This paper contains 13 sections, 11 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Method Overview. a) Pre-trained Backbone introduced in \ref{['Method: Backbone']}; b) m-NCutter, our novel module presented in \ref{['Method: MNCutter']}; c) NCut Loss detailed in \ref{['Method: mNCutLoss']}.
  • Figure 2: Multi-View Normalized Cutter (m-NCutter). The Multi-View Self-attention is a key module, to produce the fused feature map and affinity-wise attention. "$\odot$": element-wise multiplication.
  • Figure 3: Multi-View Self-Attention. Only one head is shown for simplicity while using multi-heads in our experiment.
  • Figure 4: Different Effects on Loose and Tight Clusters. The loose cluster (left) attracts more nodes/patches due to its lower tightness $\tau_c$, whereas the tight cluster (right) repels nodes with lower probability because of its higher $\tau_c$. The arrow indicates "pull" or "push". For aesthetic purposes, not all arrows are drawn.
  • Figure 5: Semantic Segmentation Visualization. "a, b" and "c, d" are from the EndoVis2018 and CholecSeg8k datasets, respectively.
  • ...and 1 more figures