EasyVis2: A Real Time Multi-view 3D Visualization System for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose
Yung-Hong Sun, Gefei Shen, Jiangang Chen, Jayer Fernandes, Amber L. Shada, Charles P. Heise, Hongrui Jiang, Yu Hen Hu
TL;DR
EasyVis2 tackles the depth perception gap in laparoscopic surgery by providing real-time multi-view 3D visualization using a five-camera array and markerless tool pose estimation. It extends the EasyVis framework with YOLOv8-Pose to detect 2D tool skeletons per view and employs multi-view triangulation to reconstruct 3D tool poses and render them over a live background, facilitated by a dedicated ST-Pose dataset. A semi-automatic data collection and augmentation strategy enables marker-free training of the 4-point grasper model, achieving high 2D pose precision (e.g., up to $\text{Precision} \approx 0.993$) and improved 3D reconstruction quality, with per-frame processing around $12.6$ ms for five views. The results demonstrate real-time performance, substantial improvements over the baseline EasyVis in 3D reconstruction metrics, and strong potential for deployment in LS training and prospective real-world surgery.
Abstract
EasyVis2 is a system designed to provide hands-free, real-time 3D visualization for laparoscopic surgery. It incorporates a surgical trocar equipped with an array of micro-cameras, which can be inserted into the body cavity to offer an enhanced field of view and a 3D perspective of the surgical procedure. A specialized deep neural network algorithm, YOLOv8-Pose, is utilized to estimate the position and orientation of surgical instruments in each individual camera view. These multi-view estimates enable the calculation of 3D poses of surgical tools, facilitating the rendering of a 3D surface model of the instruments, overlaid on the background scene, for real-time visualization. This study presents methods for adapting YOLOv8-Pose to the EasyVis2 system, including the development of a tailored training dataset. Experimental results demonstrate that, with an identical number of cameras, the new system improves 3D reconstruction accuracy and reduces computation time. Additionally, the adapted YOLOv8-Pose system shows high accuracy in 2D pose estimation.
