Table of Contents
Fetching ...

SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network

Ahmed Tawfik Aboukhadra, Nadia Robertini, Jameel Malik, Ahmed Elhayek, Gerd Reis, Didier Stricker

TL;DR

This work presents SurgeoNet, a real-time neural network pipeline to accurately detect and track surgical instruments from a stereo VR view, inspired by state-of-the-art neural-network architectural design, like YOLO and Transformers.

Abstract

Surgery monitoring in Mixed Reality (MR) environments has recently received substantial focus due to its importance in image-based decisions, skill assessment, and robot-assisted surgery. Tracking hands and articulated surgical instruments is crucial for the success of these applications. Due to the lack of annotated datasets and the complexity of the task, only a few works have addressed this problem. In this work, we present SurgeoNet, a real-time neural network pipeline to accurately detect and track surgical instruments from a stereo VR view. Our multi-stage approach is inspired by state-of-the-art neural-network architectural design, like YOLO and Transformers. We demonstrate the generalization capabilities of SurgeoNet in challenging real-world scenarios, achieved solely through training on synthetic data. The approach can be easily extended to any new set of articulated surgical instruments. SurgeoNet's code and data are publicly available.

SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network

TL;DR

This work presents SurgeoNet, a real-time neural network pipeline to accurately detect and track surgical instruments from a stereo VR view, inspired by state-of-the-art neural-network architectural design, like YOLO and Transformers.

Abstract

Surgery monitoring in Mixed Reality (MR) environments has recently received substantial focus due to its importance in image-based decisions, skill assessment, and robot-assisted surgery. Tracking hands and articulated surgical instruments is crucial for the success of these applications. Due to the lack of annotated datasets and the complexity of the task, only a few works have addressed this problem. In this work, we present SurgeoNet, a real-time neural network pipeline to accurately detect and track surgical instruments from a stereo VR view. Our multi-stage approach is inspired by state-of-the-art neural-network architectural design, like YOLO and Transformers. We demonstrate the generalization capabilities of SurgeoNet in challenging real-world scenarios, achieved solely through training on synthetic data. The approach can be easily extended to any new set of articulated surgical instruments. SurgeoNet's code and data are publicly available.
Paper Structure (17 sections, 4 equations, 6 figures, 3 tables)

This paper contains 17 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: SurgeoNet Architecture.
  • Figure 2: a) A rendered synthetic image of the studied set of surgical instruments. b) Synthetic images generated using PyTorch3D that include medical instruments in random poses with their annotations on background images from the HO-3D datasethampali2020honnotate.
  • Figure 3: The results of SurgeoNet on real unseen images.
  • Figure 4: YOLOv8 Ablation Study: a) The impact of the YOLOv8 architecture and image resolution on the accuracy (Box and Keypoint mAP5@50-95) and runtime performance (S-FPS). b) The impact of the amount of training data on YOLOv8's performance.
  • Figure 5: Confusion matrix of YOLOv8 for the 13 surgical instruments.
  • ...and 1 more figures