Table of Contents
Fetching ...

Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries

Christiaan G. A. Viviers, Lena Filatova, Maurice Termeer, Peter H. N. de With, Fons van der Sommen

TL;DR

This work tackles real-time 6-DoF instrument pose estimation under variable X-ray imaging geometries by introducing a geometry-aware data-acquisition baseline and a novel YOLOv5-6D architecture that predicts 2D key-point projections of a 3D bounding box and resolves pose via PnP under the current geometry. It demonstrates competitive RGB performance on LINEMOD while delivering real-time, geometry-aware X-ray pose estimation across diverse acquisition settings and semantic complexity, validated on cube, screw, and spine-phantom data with ADD(-S) scores up to $92.41\%$ at $0.1\cdot d$ and 41.9 FPS. The key contributions include a scalable automatic labeling setup using an external optical camera with a ChArUco board, a CSP-Net/BiFPN-based YOLOv5-6D architecture with multi-scale key-point prediction, and a calibrated X-ray projection model that enables generalization across devices. Overall, the approach offers a practical route to geometry-aware, intraoperative instrument guidance without relying on fixed imaging geometries or external navigation systems, potentially improving precision and reducing radiation exposure.

Abstract

Accurate 6-DoF pose estimation of surgical instruments during minimally invasive surgeries can substantially improve treatment strategies and eventual surgical outcome. Existing deep learning methods have achieved accurate results, but they require custom approaches for each object and laborious setup and training environments often stretching to extensive simulations, whilst lacking real-time computation. We propose a general-purpose approach of data acquisition for 6-DoF pose estimation tasks in X-ray systems, a novel and general purpose YOLOv5-6D pose architecture for accurate and fast object pose estimation and a complete method for surgical screw pose estimation under acquisition geometry consideration from a monocular cone-beam X-ray image. The proposed YOLOv5-6D pose model achieves competitive results on public benchmarks whilst being considerably faster at 42 FPS on GPU. In addition, the method generalizes across varying X-ray acquisition geometry and semantic image complexity to enable accurate pose estimation over different domains. Finally, the proposed approach is tested for bone-screw pose estimation for computer-aided guidance during spine surgeries. The model achieves a 92.41% by the 0.1 ADD-S metric, demonstrating a promising approach for enhancing surgical precision and patient outcomes. The code for YOLOv5-6D is publicly available at https://github.com/cviviers/YOLOv5-6D-Pose

Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries

TL;DR

This work tackles real-time 6-DoF instrument pose estimation under variable X-ray imaging geometries by introducing a geometry-aware data-acquisition baseline and a novel YOLOv5-6D architecture that predicts 2D key-point projections of a 3D bounding box and resolves pose via PnP under the current geometry. It demonstrates competitive RGB performance on LINEMOD while delivering real-time, geometry-aware X-ray pose estimation across diverse acquisition settings and semantic complexity, validated on cube, screw, and spine-phantom data with ADD(-S) scores up to at and 41.9 FPS. The key contributions include a scalable automatic labeling setup using an external optical camera with a ChArUco board, a CSP-Net/BiFPN-based YOLOv5-6D architecture with multi-scale key-point prediction, and a calibrated X-ray projection model that enables generalization across devices. Overall, the approach offers a practical route to geometry-aware, intraoperative instrument guidance without relying on fixed imaging geometries or external navigation systems, potentially improving precision and reducing radiation exposure.

Abstract

Accurate 6-DoF pose estimation of surgical instruments during minimally invasive surgeries can substantially improve treatment strategies and eventual surgical outcome. Existing deep learning methods have achieved accurate results, but they require custom approaches for each object and laborious setup and training environments often stretching to extensive simulations, whilst lacking real-time computation. We propose a general-purpose approach of data acquisition for 6-DoF pose estimation tasks in X-ray systems, a novel and general purpose YOLOv5-6D pose architecture for accurate and fast object pose estimation and a complete method for surgical screw pose estimation under acquisition geometry consideration from a monocular cone-beam X-ray image. The proposed YOLOv5-6D pose model achieves competitive results on public benchmarks whilst being considerably faster at 42 FPS on GPU. In addition, the method generalizes across varying X-ray acquisition geometry and semantic image complexity to enable accurate pose estimation over different domains. Finally, the proposed approach is tested for bone-screw pose estimation for computer-aided guidance during spine surgeries. The model achieves a 92.41% by the 0.1 ADD-S metric, demonstrating a promising approach for enhancing surgical precision and patient outcomes. The code for YOLOv5-6D is publicly available at https://github.com/cviviers/YOLOv5-6D-Pose
Paper Structure (25 sections, 6 equations, 9 figures, 6 tables)

This paper contains 25 sections, 6 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: X-ray projection model depicting the X-ray source, a surgical screw, detector with an attached grayscale optical camera, the detector panel and the captured X-ray image. The frame of reference for each point of interest is also depicted.
  • Figure 2: (a) Grayscale image showcasing the setup used to automatically acquire the 6-DoF pose of various objects. (b) Corresponding X-ray Dicom image of the cube. (c) Projected 3D bounding box and virtual corner coordinates.
  • Figure 3: Examples from the applied screw train, validation and test datasets. Each image also showcases the projected 3D bounding box of the screw.
  • Figure 4: Overview of the YOLOv5-6D architecture. The model backbone is based on the CSP-Net, the neck consists of the BiFPN architecture and the new model head predicts object key points at different scales. The model takes as input an image containing the object of interest and predicts object-specific key points that are used to estimate the object pose. The subblocks C3, BT1, BT2 and SPPF are depicted at the bottom in an enlarged view for further detail.
  • Figure 5: Example image from the applied cube datasets from before and after augmentation for training. Notably, a bicycle partially "occludes" the cube in the center of the augmented image, the background is changed and the image is scaled.
  • ...and 4 more figures