Table of Contents
Fetching ...

Monocular Marker-free Patient-to-Image Intraoperative Registration for Cochlear Implant Surgery

Yike Zhang, Eduardo Davalos Anaya, Jack H. Noble

TL;DR

This paper presents a novel method for monocular patient-to-image intraoperative registration, specifically designed to operate without any external hardware tracking equipment or fiducial point markers, and seamlessly integrates with monocular surgical microscopes.

Abstract

This paper presents a novel method for monocular patient-to-image intraoperative registration, specifically designed to operate without any external hardware tracking equipment or fiducial point markers. Leveraging a synthetic microscopy surgical scene dataset with a wide range of transformations, our approach directly maps preoperative CT scans to 2D intraoperative surgical frames through a lightweight neural network for real-time cochlear implant surgery guidance via a zero-shot learning approach. Unlike traditional methods, our framework seamlessly integrates with monocular surgical microscopes, making it highly practical for clinical use without additional hardware dependencies and requirements. Our method estimates camera poses, which include a rotation matrix and a translation vector, by learning from the synthetic dataset, enabling accurate and efficient intraoperative registration. The proposed framework was evaluated on nine clinical cases using a patient-specific and cross-patient validation strategy. Our results suggest that our approach achieves clinically relevant accuracy in predicting 6D camera poses for registering 3D preoperative CT scans to 2D surgical scenes with an angular error within 10 degrees in most cases, while also addressing limitations of traditional methods, such as reliance on external tracking systems or fiducial markers.

Monocular Marker-free Patient-to-Image Intraoperative Registration for Cochlear Implant Surgery

TL;DR

This paper presents a novel method for monocular patient-to-image intraoperative registration, specifically designed to operate without any external hardware tracking equipment or fiducial point markers, and seamlessly integrates with monocular surgical microscopes.

Abstract

This paper presents a novel method for monocular patient-to-image intraoperative registration, specifically designed to operate without any external hardware tracking equipment or fiducial point markers. Leveraging a synthetic microscopy surgical scene dataset with a wide range of transformations, our approach directly maps preoperative CT scans to 2D intraoperative surgical frames through a lightweight neural network for real-time cochlear implant surgery guidance via a zero-shot learning approach. Unlike traditional methods, our framework seamlessly integrates with monocular surgical microscopes, making it highly practical for clinical use without additional hardware dependencies and requirements. Our method estimates camera poses, which include a rotation matrix and a translation vector, by learning from the synthetic dataset, enabling accurate and efficient intraoperative registration. The proposed framework was evaluated on nine clinical cases using a patient-specific and cross-patient validation strategy. Our results suggest that our approach achieves clinically relevant accuracy in predicting 6D camera poses for registering 3D preoperative CT scans to 2D surgical scenes with an angular error within 10 degrees in most cases, while also addressing limitations of traditional methods, such as reliance on external tracking systems or fiducial markers.

Paper Structure

This paper contains 7 sections, 4 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Construct Postmastoidectomy Surface. Extract the 3D surface directly from the preoperative CT scan.
  • Figure 2: Pipeline for Synthesizing a Postmastoidectomy Surgical Scene. (a) original surgical scene, (b) 3D-to-2D registration, (c) textured postmastoidectomy surface, and (d) synthetic surgical scene.
  • Figure 3: Multi-view Synthetic Surgical Scenes. Each synthetic surgical scene was generated using distinct $\mathbf{P}$, to highlight the variability and effectiveness of our approach in capturing diverse perspectives during the multi-view surgical scene synthesis across both the left and right ears of different patients.
  • Figure 4: Proposed Pose Regression Model. $\vec{R}$ and $\vec{t}$ are output by the network to register the 3D postmastoidectomy surface directly to the 2D image.
  • Figure 5: Performance Comparisons. Comprehensive evaluation of nine individual patient cases under both patient-specific and cross-patient scenarios to analyze robustness and generalizability.
  • ...and 1 more figures