Tech Report: Divide and Conquer 3D Real-Time Reconstruction for Improved IGS
Yicheng Zhu
TL;DR
This work presents a modular, config-driven pipeline for real-time 3D reconstruction in endoscopic image-guided surgery, combining frame selection, depth estimation, and ICP-based fusion. It demonstrates integration of Depth-Anything v2 and EndoDAC, with adaptive ICP thresholding and visualization to handle challenging ESSBS scenes. Experiments on the Hamlyn dataset show that Depth-Anything v2 generally outperforms alternatives and that adaptive thresholding can improve alignment, though challenges remain from relative depth scales and ground-truth inconsistencies. The framework offers a flexible foundation for real-time, clinically relevant 3D models in endoscopy, while pointing to necessary improvements in pose estimation and data preprocessing for robust deployment.
Abstract
Tracking surgical modifications based on endoscopic videos is technically feasible and of great clinical advantages; however, it still remains challenging. This report presents a modular pipeline to divide and conquer the clinical challenges in the process. The pipeline integrates frame selection, depth estimation, and 3D reconstruction components, allowing for flexibility and adaptability in incorporating new methods. Recent advancements, including the integration of Depth-Anything V2 and EndoDAC for depth estimation, as well as improvements in the Iterative Closest Point (ICP) alignment process, are detailed. Experiments conducted on the Hamlyn dataset demonstrate the effectiveness of the integrated methods. System capability and limitations are both discussed.
