Table of Contents
Fetching ...

EndoDDC: Learning Sparse to Dense Reconstruction for Endoscopic Robotic Navigation via Diffusion Depth Completion

Yinheng Lin, Yiming Huang, Beilei Cui, Long Bai, Huxin Gao, Hongliang Ren, Jiewen Lai

TL;DR

EndoDDC is an endoscopy depth completion method that integrates images, sparse depth information with depth gradient features, and optimizes depth maps through a diffusion model, addressing the issues of weak texture and light reflection in endoscopic environments.

Abstract

Accurate depth estimation plays a critical role in the navigation of endoscopic surgical robots, forming the foundation for 3D reconstruction and safe instrument guidance. Fine-tuning pretrained models heavily relies on endoscopic surgical datasets with precise depth annotations. While existing self-supervised depth estimation techniques eliminate the need for accurate depth annotations, their performance degrades in environments with weak textures and variable lighting, leading to sparse reconstruction with invalid depth estimation. Depth completion using sparse depth maps can mitigate these issues and improve accuracy. Despite the advances in depth completion techniques in general fields, their application in endoscopy remains limited. To overcome these limitations, we propose EndoDDC, an endoscopy depth completion method that integrates images, sparse depth information with depth gradient features, and optimizes depth maps through a diffusion model, addressing the issues of weak texture and light reflection in endoscopic environments. Extensive experiments on two publicly available endoscopy datasets show that our approach outperforms state-of-the-art models in both depth accuracy and robustness. This demonstrates the potential of our method to reduce visual errors in complex endoscopic environments. Our code will be released at https://github.com/yinheng-lin/EndoDDC.

EndoDDC: Learning Sparse to Dense Reconstruction for Endoscopic Robotic Navigation via Diffusion Depth Completion

TL;DR

EndoDDC is an endoscopy depth completion method that integrates images, sparse depth information with depth gradient features, and optimizes depth maps through a diffusion model, addressing the issues of weak texture and light reflection in endoscopic environments.

Abstract

Accurate depth estimation plays a critical role in the navigation of endoscopic surgical robots, forming the foundation for 3D reconstruction and safe instrument guidance. Fine-tuning pretrained models heavily relies on endoscopic surgical datasets with precise depth annotations. While existing self-supervised depth estimation techniques eliminate the need for accurate depth annotations, their performance degrades in environments with weak textures and variable lighting, leading to sparse reconstruction with invalid depth estimation. Depth completion using sparse depth maps can mitigate these issues and improve accuracy. Despite the advances in depth completion techniques in general fields, their application in endoscopy remains limited. To overcome these limitations, we propose EndoDDC, an endoscopy depth completion method that integrates images, sparse depth information with depth gradient features, and optimizes depth maps through a diffusion model, addressing the issues of weak texture and light reflection in endoscopic environments. Extensive experiments on two publicly available endoscopy datasets show that our approach outperforms state-of-the-art models in both depth accuracy and robustness. This demonstrates the potential of our method to reduce visual errors in complex endoscopic environments. Our code will be released at https://github.com/yinheng-lin/EndoDDC.
Paper Structure (16 sections, 12 equations, 5 figures, 3 tables)

This paper contains 16 sections, 12 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Compared with fine-tuning the SOTA foundational model, our method generates more robust and accurate depth from sparse depth and RGB input, achieving superior sparse to dense reconstruction.
  • Figure 2: Overview of EndoDDC: After feature extraction from the RGB image and sparse depth map, the Depth Grad Fusion module iteratively updates the state hidden network based on depth and gradient features. This output is then fed into the Depth Diffusion guidance model to conditionally optimize the initial depth. Finally, the optimized coarse depth map undergoes up-sampling and SPN refinement module to produce the final depth map.
  • Figure 3: Details of the proposed model. (a) Details of the Depth Grad Fusion module; (b) The architectural design of the Depth Diffusion module.
  • Figure 4: Qualitative comparison on C3VD and StereoMIS datasets. We compare EndoDDC with SOTA depth estimate and depth completion methods; our method generates less error in terms of tissue details.
  • Figure 5: Results of EndoDDC at different levels of sparsity. Our method generates robust dense depth across different levels of sparsity.