Table of Contents
Fetching ...

Exploring Optical Flow Inclusion into nnU-Net Framework for Surgical Instrument Segmentation

Marcos Fernández-Rodríguez, Bruno Silva, Sandro Queirós, Helena R. Torres, Bruno Oliveira, Pedro Morais, Lukas R. Buschle, Jorge Correia-Pinto, Estevão Lima, João L. Vilaça

TL;DR

Problem: robust surgical instrument segmentation in dynamic laparoscopy is challenging due to motion and appearance changes. Approach: evaluate adding optical-flow maps as additional inputs to the nnU-Net framework, generating OF with ARFlow and testing representations (RGBof, XY, PC) and time references (t1, t5) while preserving architecture. Findings: incorporating OF improves segmentation performance, notably a mean Dice coefficient ($DC$) increase of about 7.8% and recall increase of about 10.6%, with RGBof offering the strongest gains for the moving L-hook class. Significance: demonstrates that temporal information can be integrated into established, low-expertise pipelines with minimal architectural changes, guiding future OF-preserving augmentations and multi-dataset validation.

Abstract

Surgical instrument segmentation in laparoscopy is essential for computer-assisted surgical systems. Despite the Deep Learning progress in recent years, the dynamic setting of laparoscopic surgery still presents challenges for precise segmentation. The nnU-Net framework excelled in semantic segmentation analyzing single frames without temporal information. The framework's ease of use, including its ability to be automatically configured, and its low expertise requirements, have made it a popular base framework for comparisons. Optical flow (OF) is a tool commonly used in video tasks to estimate motion and represent it in a single frame, containing temporal information. This work seeks to employ OF maps as an additional input to the nnU-Net architecture to improve its performance in the surgical instrument segmentation task, taking advantage of the fact that instruments are the main moving objects in the surgical field. With this new input, the temporal component would be indirectly added without modifying the architecture. Using CholecSeg8k dataset, three different representations of movement were estimated and used as new inputs, comparing them with a baseline model. Results showed that the use of OF maps improves the detection of classes with high movement, even when these are scarce in the dataset. To further improve performance, future work may focus on implementing other OF-preserving augmentations.

Exploring Optical Flow Inclusion into nnU-Net Framework for Surgical Instrument Segmentation

TL;DR

Problem: robust surgical instrument segmentation in dynamic laparoscopy is challenging due to motion and appearance changes. Approach: evaluate adding optical-flow maps as additional inputs to the nnU-Net framework, generating OF with ARFlow and testing representations (RGBof, XY, PC) and time references (t1, t5) while preserving architecture. Findings: incorporating OF improves segmentation performance, notably a mean Dice coefficient () increase of about 7.8% and recall increase of about 10.6%, with RGBof offering the strongest gains for the moving L-hook class. Significance: demonstrates that temporal information can be integrated into established, low-expertise pipelines with minimal architectural changes, guiding future OF-preserving augmentations and multi-dataset validation.

Abstract

Surgical instrument segmentation in laparoscopy is essential for computer-assisted surgical systems. Despite the Deep Learning progress in recent years, the dynamic setting of laparoscopic surgery still presents challenges for precise segmentation. The nnU-Net framework excelled in semantic segmentation analyzing single frames without temporal information. The framework's ease of use, including its ability to be automatically configured, and its low expertise requirements, have made it a popular base framework for comparisons. Optical flow (OF) is a tool commonly used in video tasks to estimate motion and represent it in a single frame, containing temporal information. This work seeks to employ OF maps as an additional input to the nnU-Net architecture to improve its performance in the surgical instrument segmentation task, taking advantage of the fact that instruments are the main moving objects in the surgical field. With this new input, the temporal component would be indirectly added without modifying the architecture. Using CholecSeg8k dataset, three different representations of movement were estimated and used as new inputs, comparing them with a baseline model. Results showed that the use of OF maps improves the detection of classes with high movement, even when these are scarce in the dataset. To further improve performance, future work may focus on implementing other OF-preserving augmentations.
Paper Structure (14 sections, 4 figures, 5 tables)

This paper contains 14 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of the proposed workflow and segmentation classes: a) Grasper; b) L-hook.
  • Figure 2: Graphical representation of the dataset used.
  • Figure 3: Four examples of the segmentation results for the 'RGB' (left) and 't1 RGBof' (middle) variants, plus the 'RGBof' image (right) used as OF input in the latter.
  • Figure 4: CholecSeg8k inconsistencies found during results analysis.