Motion-Guided Dual-Camera Tracker for Endoscope Tracking and Motion Analysis in a Mechanical Gastric Simulator
Yuelin Zhang, Kim Yan, Chun Ping Lam, Chengyu Fang, Wenxuan Xie, Yufu Qiu, Raymond Shing-Yan Tang, Shing Shin Cheng
TL;DR
This work introduces a vision-based dual-camera tracker for endoscope tip localization inside a realistic mechanical gastric simulator. By combining Cross-camera Mutual Template Strategy (CMT) with a Mamba-based Motion-guided Prediction Head (MMH), the approach achieves robust 2D tracking and precise 3D localization via stereo disparity, addressing appearance variation, occlusion, and light distortion. The method yields state-of-the-art 2D tracking and substantially improved 3D accuracy, and its motion analysis pipeline demonstrates clearer differentiation between expert and novice endoscopists. The proposed framework holds practical potential for training, evaluation, and closed-loop control in flexible endoscopy and robotic surgery, with a public project page for reproduction.
Abstract
Flexible endoscope motion tracking and analysis in mechanical simulators have proven useful for endoscopy training. Common motion tracking methods based on electromagnetic tracker are however limited by their high cost and material susceptibility. In this work, the motion-guided dual-camera vision tracker is proposed to provide robust and accurate tracking of the endoscope tip's 3D position. The tracker addresses several unique challenges of tracking flexible endoscope tip inside a dynamic, life-sized mechanical simulator. To address the appearance variation and keep dual-camera tracking consistency, the cross-camera mutual template strategy (CMT) is proposed by introducing dynamic transient mutual templates. To alleviate large occlusion and light-induced distortion, the Mamba-based motion-guided prediction head (MMH) is presented to aggregate historical motion with visual tracking. The proposed tracker achieves superior performance against state-of-the-art vision trackers, achieving 42% and 72% improvements against the second-best method in average error and maximum error. Further motion analysis involving novice and expert endoscopists also shows that the tip 3D motion provided by the proposed tracker enables more reliable motion analysis and more substantial differentiation between different expertise levels, compared with other trackers. Project page: https://github.com/PieceZhang/MotionDCTrack
