Table of Contents
Fetching ...

Domain adaptation strategies for 3D reconstruction of the lumbar spine using real fluoroscopy data

Sascha Jecklin, Youyang Shen, Amandine Gout, Daniel Suter, Lilian Calvet, Lukas Zingg, Jennifer Straub, Nicola Alessandro Cavalcanti, Mazda Farshad, Philipp Fürnstahl, Hooman Esfandiari

TL;DR

This work tackles the domain gap that limits real-world deployment of intraoperative 3D spine reconstruction from sparse fluoroscopy. It combines synthetic-DRR data with ex-vivo paired X-ray data, and introduces transfer learning and style transfer to bridge the domain divide, enabling real-time lumbar spine reconstructions from as few as three X-ray views. The approach achieves surface accuracies suitable for surgical guidance, with robust performance across view configurations and a fast inference time (~81 ms), supporting potential integration into surgical navigation and robotics. The study also provides a rigorous ex-vivo dataset and analyses of view-angle sensitivity, challenging regions, and origin estimation to inform future clinical translation and system enhancements.

Abstract

This study tackles key obstacles in adopting surgical navigation in orthopedic surgeries, including time, cost, radiation, and workflow integration challenges. Recently, our work X23D showed an approach for generating 3D anatomical models of the spine from only a few intraoperative fluoroscopic images. This negates the need for conventional registration-based surgical navigation by creating a direct intraoperative 3D reconstruction of the anatomy. Despite these strides, the practical application of X23D has been limited by a domain gap between synthetic training data and real intraoperative images. In response, we devised a novel data collection protocol for a paired dataset consisting of synthetic and real fluoroscopic images from the same perspectives. Utilizing this dataset, we refined our deep learning model via transfer learning, effectively bridging the domain gap between synthetic and real X-ray data. A novel style transfer mechanism also allows us to convert real X-rays to mirror the synthetic domain, enabling our in-silico-trained X23D model to achieve high accuracy in real-world settings. Our results demonstrated that the refined model can rapidly generate accurate 3D reconstructions of the entire lumbar spine from as few as three intraoperative fluoroscopic shots. It achieved an 84% F1 score, matching the accuracy of our previous synthetic data-based research. Additionally, with a computational time of only 81.1 ms, our approach provides real-time capabilities essential for surgery integration. Through examining ideal imaging setups and view angle dependencies, we've further confirmed our system's practicality and dependability in clinical settings. Our research marks a significant step forward in intraoperative 3D reconstruction, offering enhancements to surgical planning, navigation, and robotics.

Domain adaptation strategies for 3D reconstruction of the lumbar spine using real fluoroscopy data

TL;DR

This work tackles the domain gap that limits real-world deployment of intraoperative 3D spine reconstruction from sparse fluoroscopy. It combines synthetic-DRR data with ex-vivo paired X-ray data, and introduces transfer learning and style transfer to bridge the domain divide, enabling real-time lumbar spine reconstructions from as few as three X-ray views. The approach achieves surface accuracies suitable for surgical guidance, with robust performance across view configurations and a fast inference time (~81 ms), supporting potential integration into surgical navigation and robotics. The study also provides a rigorous ex-vivo dataset and analyses of view-angle sensitivity, challenging regions, and origin estimation to inform future clinical translation and system enhancements.

Abstract

This study tackles key obstacles in adopting surgical navigation in orthopedic surgeries, including time, cost, radiation, and workflow integration challenges. Recently, our work X23D showed an approach for generating 3D anatomical models of the spine from only a few intraoperative fluoroscopic images. This negates the need for conventional registration-based surgical navigation by creating a direct intraoperative 3D reconstruction of the anatomy. Despite these strides, the practical application of X23D has been limited by a domain gap between synthetic training data and real intraoperative images. In response, we devised a novel data collection protocol for a paired dataset consisting of synthetic and real fluoroscopic images from the same perspectives. Utilizing this dataset, we refined our deep learning model via transfer learning, effectively bridging the domain gap between synthetic and real X-ray data. A novel style transfer mechanism also allows us to convert real X-rays to mirror the synthetic domain, enabling our in-silico-trained X23D model to achieve high accuracy in real-world settings. Our results demonstrated that the refined model can rapidly generate accurate 3D reconstructions of the entire lumbar spine from as few as three intraoperative fluoroscopic shots. It achieved an 84% F1 score, matching the accuracy of our previous synthetic data-based research. Additionally, with a computational time of only 81.1 ms, our approach provides real-time capabilities essential for surgery integration. Through examining ideal imaging setups and view angle dependencies, we've further confirmed our system's practicality and dependability in clinical settings. Our research marks a significant step forward in intraoperative 3D reconstruction, offering enhancements to surgical planning, navigation, and robotics.
Paper Structure (31 sections, 4 equations, 12 figures, 5 tables)

This paper contains 31 sections, 4 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Frontend of our custom-built DRR generator software. Intrinsic and extrinsic parameters can be adapted on the left-hand side. The resulting DRR and the pose of the virtual C-arm are displayed next to it.
  • Figure 2: Our data collection process generates synthetic-real X-ray image pairs with pose information.
  • Figure 3: X-ray calibration pipeline. a) the fiducial projections are detected (blue: detected reference fiducial, green: detected fiducial), b) the initially estimated 2D-3D correspondence, c) the rectified 2D-3D correspondence, d) fiducial inpainting.
  • Figure 4: Visual representation of the reconstruction cube origin estimation. A vertebra level of interest is first localized in all images, resulting in a shift addressed with Equation (\ref{['eq:shift']}). The closest intersection of the back-projected vertebra center through the 2D image centers is calculated (Equation (\ref{['eq:intersection']})). The resulting object center has a known offset from the reconstruction cube origin, which is crucial for accurate 3D reconstruction.
  • Figure 5: Our 3D reconstruction network. A 2D feature extraction stage first processes pairs of input images with the corresponding poses. These features are then back-projected into 3D, where they are averaged and refined by a 3D stage. The reconstruction is placed at a centroid estimated from the intersection of all input images.
  • ...and 7 more figures