Table of Contents
Fetching ...

Tech Report: Divide and Conquer 3D Real-Time Reconstruction for Improved IGS

Yicheng Zhu

TL;DR

This work presents a modular, config-driven pipeline for real-time 3D reconstruction in endoscopic image-guided surgery, combining frame selection, depth estimation, and ICP-based fusion. It demonstrates integration of Depth-Anything v2 and EndoDAC, with adaptive ICP thresholding and visualization to handle challenging ESSBS scenes. Experiments on the Hamlyn dataset show that Depth-Anything v2 generally outperforms alternatives and that adaptive thresholding can improve alignment, though challenges remain from relative depth scales and ground-truth inconsistencies. The framework offers a flexible foundation for real-time, clinically relevant 3D models in endoscopy, while pointing to necessary improvements in pose estimation and data preprocessing for robust deployment.

Abstract

Tracking surgical modifications based on endoscopic videos is technically feasible and of great clinical advantages; however, it still remains challenging. This report presents a modular pipeline to divide and conquer the clinical challenges in the process. The pipeline integrates frame selection, depth estimation, and 3D reconstruction components, allowing for flexibility and adaptability in incorporating new methods. Recent advancements, including the integration of Depth-Anything V2 and EndoDAC for depth estimation, as well as improvements in the Iterative Closest Point (ICP) alignment process, are detailed. Experiments conducted on the Hamlyn dataset demonstrate the effectiveness of the integrated methods. System capability and limitations are both discussed.

Tech Report: Divide and Conquer 3D Real-Time Reconstruction for Improved IGS

TL;DR

This work presents a modular, config-driven pipeline for real-time 3D reconstruction in endoscopic image-guided surgery, combining frame selection, depth estimation, and ICP-based fusion. It demonstrates integration of Depth-Anything v2 and EndoDAC, with adaptive ICP thresholding and visualization to handle challenging ESSBS scenes. Experiments on the Hamlyn dataset show that Depth-Anything v2 generally outperforms alternatives and that adaptive thresholding can improve alignment, though challenges remain from relative depth scales and ground-truth inconsistencies. The framework offers a flexible foundation for real-time, clinically relevant 3D models in endoscopy, while pointing to necessary improvements in pose estimation and data preprocessing for robust deployment.

Abstract

Tracking surgical modifications based on endoscopic videos is technically feasible and of great clinical advantages; however, it still remains challenging. This report presents a modular pipeline to divide and conquer the clinical challenges in the process. The pipeline integrates frame selection, depth estimation, and 3D reconstruction components, allowing for flexibility and adaptability in incorporating new methods. Recent advancements, including the integration of Depth-Anything V2 and EndoDAC for depth estimation, as well as improvements in the Iterative Closest Point (ICP) alignment process, are detailed. Experiments conducted on the Hamlyn dataset demonstrate the effectiveness of the integrated methods. System capability and limitations are both discussed.
Paper Structure (43 sections, 10 equations, 11 figures, 2 tables)

This paper contains 43 sections, 10 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 3: Heatmap of the per-pixel alignment error at the first (left) and tenth (right) ICP iteration. Darker areas indicate masked-out regions, while dark blue areas represent smaller errors.
  • Figure 4: Visualization comparing Depth-Anything V2 and EndoDAC results on a sample frame, turns out to be having an erroneous ground truth.[The 25th frame of rectified from Hamlyn.]
  • Figure 5: Four plots for constant value method
  • Figure 6: Four plots for 90 percentile method
  • Figure 7: Four plots for mean method
  • ...and 6 more figures