Table of Contents
Fetching ...

DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy

Qingyao Tian, Huai Liao, Xinyan Huang, Jian Chen, Zihui Zhang, Bingyu Yang, Sebastien Ourselin, Hongbin Liu

TL;DR

The paper targets real-time 6-DOF localization in visually navigated bronchoscopy by decoupling depth estimation from pose tracking. It introduces a knowledge-embedded monocular depth estimator guided by view synthesis, and a dual-loop localization that interleaves fast ego-motion inference with depth-map registration to a pre-operative airway model. The approach achieves superior depth accuracy and robust, near-video-rate localization across phantom and in-vivo data without case-wise retraining, demonstrated by ATE in the few-millimeter range and high frame rates. This framework holds promise for clinically practical real-time bronchoscopic navigation and guidance, with potential improvements from feature matching and relocalization strategies as future work.

Abstract

Real-time 6 DOF localization of bronchoscopes is crucial for enhancing intervention quality. However, current vision-based technologies struggle to balance between generalization to unseen data and computational speed. In this study, we propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB) that can generalize across patient cases without the need of re-training. The DD-VNB framework integrates two key modules: depth estimation and dual-loop localization. To address the domain gap among patients, we propose a knowledge-embedded depth estimation network that maps endoscope frames to depth, ensuring generalization by eliminating patient-specific textures. The network embeds view synthesis knowledge into a cycle adversarial architecture for scale-constrained monocular depth estimation. For real-time performance, our localization module embeds a fast ego-motion estimation network into the loop of depth registration. The ego-motion inference network estimates the pose change of the bronchoscope in high frequency while depth registration against the pre-operative 3D model provides absolute pose periodically. Specifically, the relative pose changes are fed into the registration process as the initial guess to boost its accuracy and speed. Experiments on phantom and in-vivo data from patients demonstrate the effectiveness of our framework: 1) monocular depth estimation outperforms SOTA, 2) localization achieves an accuracy of Absolute Tracking Error (ATE) of 4.7 $\pm$ 3.17 mm in phantom and 6.49 $\pm$ 3.88 mm in patient data, 3) with a frame-rate approaching video capture speed, 4) without the necessity of case-wise network retraining. The framework's superior speed and accuracy demonstrate its promising clinical potential for real-time bronchoscopic navigation.

DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy

TL;DR

The paper targets real-time 6-DOF localization in visually navigated bronchoscopy by decoupling depth estimation from pose tracking. It introduces a knowledge-embedded monocular depth estimator guided by view synthesis, and a dual-loop localization that interleaves fast ego-motion inference with depth-map registration to a pre-operative airway model. The approach achieves superior depth accuracy and robust, near-video-rate localization across phantom and in-vivo data without case-wise retraining, demonstrated by ATE in the few-millimeter range and high frame rates. This framework holds promise for clinically practical real-time bronchoscopic navigation and guidance, with potential improvements from feature matching and relocalization strategies as future work.

Abstract

Real-time 6 DOF localization of bronchoscopes is crucial for enhancing intervention quality. However, current vision-based technologies struggle to balance between generalization to unseen data and computational speed. In this study, we propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB) that can generalize across patient cases without the need of re-training. The DD-VNB framework integrates two key modules: depth estimation and dual-loop localization. To address the domain gap among patients, we propose a knowledge-embedded depth estimation network that maps endoscope frames to depth, ensuring generalization by eliminating patient-specific textures. The network embeds view synthesis knowledge into a cycle adversarial architecture for scale-constrained monocular depth estimation. For real-time performance, our localization module embeds a fast ego-motion estimation network into the loop of depth registration. The ego-motion inference network estimates the pose change of the bronchoscope in high frequency while depth registration against the pre-operative 3D model provides absolute pose periodically. Specifically, the relative pose changes are fed into the registration process as the initial guess to boost its accuracy and speed. Experiments on phantom and in-vivo data from patients demonstrate the effectiveness of our framework: 1) monocular depth estimation outperforms SOTA, 2) localization achieves an accuracy of Absolute Tracking Error (ATE) of 4.7 3.17 mm in phantom and 6.49 3.88 mm in patient data, 3) with a frame-rate approaching video capture speed, 4) without the necessity of case-wise network retraining. The framework's superior speed and accuracy demonstrate its promising clinical potential for real-time bronchoscopic navigation.
Paper Structure (15 sections, 10 equations, 6 figures, 3 tables)

This paper contains 15 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Experimental set up (a) using phantom, (b) by acquiring patient data.
  • Figure 2: The overview of proposed framework, where $ref$ denotes reference time point. During intervention, we first estimate the incoming bronchoscopic frame’s depth map. Then, a dual-loop diagram is introduced to locate the camera position. The ego-motion loop tracks the camera position by inferring the camera movement between a pair of input depth maps in real time. The registration loop infers global pose by referring to pre-operative airway map and inertially eliminates accumulative error by ego-motion estimation. For the next iteration, $ref+m$ serves as the next reference time point for dual-loop iteration, and $P_{ref+m}$ is considered as the initial value for registration.
  • Figure 3: Our depth estimation network training incorporates scale-awareness by combining unpaired image-to-image translation with view synthesis, enforcing view consistency during training. In the $X \rightarrow Z$ direction (lower half), depth maps $\hat{z}_t$ and $\hat{z}_{t-n}$ are generated for frames $x_t$ and $x_{t-n}$, and camera motion is inferred by the pretrained ego-motion estimation network. With depth and motion, view-synthesized image $w\left(x_{t-n}\right)$ and reprojected depth $\hat{z}_t^{t-n}$ are obtained, enforcing consistency between $x_t$ and $w\left(x_{t-n}\right)$, and geometry consistency between $\hat{z}_t$ and $\hat{z}_t^{t-n}$. In the $Z \rightarrow X$ direction (upper half), ground truth pose and depth in virtual bronchoscopy yield $w\left(\hat{x}_{t-n}\right)$, enforcing view consistency with $\hat{x}_t$. Adversarial loss in the diagram combines discriminators $D_{{\text{depth}}}$ and $D_{{\text{image}}}$.
  • Figure 4: Quantitative depth evaluations. The original input image, depth ground truth, predicted depth maps and error heatmaps by our depth estimation, ours w/o view consistency loss for generated bronchoscopic frame, ours w/o view consistency loss, CycleGAN and EndoSLAM are shown from left to right.
  • Figure 5: Example of located virtual view using different localization frame-works. E represents ego-motion estimation and R represents registration. Incremental tracking methods (including EndoSLAM, DD-VNB w/o R) are not included because most of their located views are outside the airway model. Frames where tracking was lost are box selected in red.
  • ...and 1 more figures