Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries
Christiaan G. A. Viviers, Lena Filatova, Maurice Termeer, Peter H. N. de With, Fons van der Sommen
TL;DR
This work tackles real-time 6-DoF instrument pose estimation under variable X-ray imaging geometries by introducing a geometry-aware data-acquisition baseline and a novel YOLOv5-6D architecture that predicts 2D key-point projections of a 3D bounding box and resolves pose via PnP under the current geometry. It demonstrates competitive RGB performance on LINEMOD while delivering real-time, geometry-aware X-ray pose estimation across diverse acquisition settings and semantic complexity, validated on cube, screw, and spine-phantom data with ADD(-S) scores up to $92.41\%$ at $0.1\cdot d$ and 41.9 FPS. The key contributions include a scalable automatic labeling setup using an external optical camera with a ChArUco board, a CSP-Net/BiFPN-based YOLOv5-6D architecture with multi-scale key-point prediction, and a calibrated X-ray projection model that enables generalization across devices. Overall, the approach offers a practical route to geometry-aware, intraoperative instrument guidance without relying on fixed imaging geometries or external navigation systems, potentially improving precision and reducing radiation exposure.
Abstract
Accurate 6-DoF pose estimation of surgical instruments during minimally invasive surgeries can substantially improve treatment strategies and eventual surgical outcome. Existing deep learning methods have achieved accurate results, but they require custom approaches for each object and laborious setup and training environments often stretching to extensive simulations, whilst lacking real-time computation. We propose a general-purpose approach of data acquisition for 6-DoF pose estimation tasks in X-ray systems, a novel and general purpose YOLOv5-6D pose architecture for accurate and fast object pose estimation and a complete method for surgical screw pose estimation under acquisition geometry consideration from a monocular cone-beam X-ray image. The proposed YOLOv5-6D pose model achieves competitive results on public benchmarks whilst being considerably faster at 42 FPS on GPU. In addition, the method generalizes across varying X-ray acquisition geometry and semantic image complexity to enable accurate pose estimation over different domains. Finally, the proposed approach is tested for bone-screw pose estimation for computer-aided guidance during spine surgeries. The model achieves a 92.41% by the 0.1 ADD-S metric, demonstrating a promising approach for enhancing surgical precision and patient outcomes. The code for YOLOv5-6D is publicly available at https://github.com/cviviers/YOLOv5-6D-Pose
