Table of Contents
Fetching ...

Motion-Guided Dual-Camera Tracker for Endoscope Tracking and Motion Analysis in a Mechanical Gastric Simulator

Yuelin Zhang, Kim Yan, Chun Ping Lam, Chengyu Fang, Wenxuan Xie, Yufu Qiu, Raymond Shing-Yan Tang, Shing Shin Cheng

TL;DR

This work introduces a vision-based dual-camera tracker for endoscope tip localization inside a realistic mechanical gastric simulator. By combining Cross-camera Mutual Template Strategy (CMT) with a Mamba-based Motion-guided Prediction Head (MMH), the approach achieves robust 2D tracking and precise 3D localization via stereo disparity, addressing appearance variation, occlusion, and light distortion. The method yields state-of-the-art 2D tracking and substantially improved 3D accuracy, and its motion analysis pipeline demonstrates clearer differentiation between expert and novice endoscopists. The proposed framework holds practical potential for training, evaluation, and closed-loop control in flexible endoscopy and robotic surgery, with a public project page for reproduction.

Abstract

Flexible endoscope motion tracking and analysis in mechanical simulators have proven useful for endoscopy training. Common motion tracking methods based on electromagnetic tracker are however limited by their high cost and material susceptibility. In this work, the motion-guided dual-camera vision tracker is proposed to provide robust and accurate tracking of the endoscope tip's 3D position. The tracker addresses several unique challenges of tracking flexible endoscope tip inside a dynamic, life-sized mechanical simulator. To address the appearance variation and keep dual-camera tracking consistency, the cross-camera mutual template strategy (CMT) is proposed by introducing dynamic transient mutual templates. To alleviate large occlusion and light-induced distortion, the Mamba-based motion-guided prediction head (MMH) is presented to aggregate historical motion with visual tracking. The proposed tracker achieves superior performance against state-of-the-art vision trackers, achieving 42% and 72% improvements against the second-best method in average error and maximum error. Further motion analysis involving novice and expert endoscopists also shows that the tip 3D motion provided by the proposed tracker enables more reliable motion analysis and more substantial differentiation between different expertise levels, compared with other trackers. Project page: https://github.com/PieceZhang/MotionDCTrack

Motion-Guided Dual-Camera Tracker for Endoscope Tracking and Motion Analysis in a Mechanical Gastric Simulator

TL;DR

This work introduces a vision-based dual-camera tracker for endoscope tip localization inside a realistic mechanical gastric simulator. By combining Cross-camera Mutual Template Strategy (CMT) with a Mamba-based Motion-guided Prediction Head (MMH), the approach achieves robust 2D tracking and precise 3D localization via stereo disparity, addressing appearance variation, occlusion, and light distortion. The method yields state-of-the-art 2D tracking and substantially improved 3D accuracy, and its motion analysis pipeline demonstrates clearer differentiation between expert and novice endoscopists. The proposed framework holds practical potential for training, evaluation, and closed-loop control in flexible endoscopy and robotic surgery, with a public project page for reproduction.

Abstract

Flexible endoscope motion tracking and analysis in mechanical simulators have proven useful for endoscopy training. Common motion tracking methods based on electromagnetic tracker are however limited by their high cost and material susceptibility. In this work, the motion-guided dual-camera vision tracker is proposed to provide robust and accurate tracking of the endoscope tip's 3D position. The tracker addresses several unique challenges of tracking flexible endoscope tip inside a dynamic, life-sized mechanical simulator. To address the appearance variation and keep dual-camera tracking consistency, the cross-camera mutual template strategy (CMT) is proposed by introducing dynamic transient mutual templates. To alleviate large occlusion and light-induced distortion, the Mamba-based motion-guided prediction head (MMH) is presented to aggregate historical motion with visual tracking. The proposed tracker achieves superior performance against state-of-the-art vision trackers, achieving 42% and 72% improvements against the second-best method in average error and maximum error. Further motion analysis involving novice and expert endoscopists also shows that the tip 3D motion provided by the proposed tracker enables more reliable motion analysis and more substantial differentiation between different expertise levels, compared with other trackers. Project page: https://github.com/PieceZhang/MotionDCTrack
Paper Structure (12 sections, 5 equations, 4 figures, 4 tables)

This paper contains 12 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Structure overview. $\varphi_1$ to $\varphi_5$ denote the layers in the Siamese ResNet he2016deep backbone, $\varphi_n(x_i)$ and $\varphi_n(x_j)$ are the intermediate output from backbone, where $n\in \{3,4,5\}$, $\{i,j\} = \{1,2\}$. Three CMTs are cascaded behind $\varphi_3$, $\varphi_4$, and $\varphi_5$. Each of the three CMTs is then followed by an MMH. For simplicity, the figure only shows details in CMT(5,2) and MMH(5,1). All CMTs and MMHs follow the same workflow.
  • Figure 2: Experiment setup. (a) Self-developed mechanical gastric simulator and installation of dual-camera tracking devices. (b) Flexible gastric endoscope with EMT affixed at its tip to provide the 3D ground truth. (c) Dual camera pairs used in this work and image examples collected by different camera pairs.
  • Figure 3: Left: Demonstration of the dual-camera tracking comparison. Our tracker not only achieves the most accurate tracking under multiple disturbances but also has the best tracking consistency across the dual cameras (See supplementary video for more tracking demonstration). Right: 3D motion trajectory ground truth measured by EMT and estimated 3D motion from different methods.
  • Figure 4: Demonstration of the procedures done during motion analysis.