Table of Contents
Fetching ...

ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation

Zhenhua Wu, Yanlin Jin, Liangdong Qiu, Xiaoguang Han, Xiang Wan, Guanbin Li

TL;DR

ToDER tackles the challenge of depth estimation and 3D reconstruction in optical colonoscopy by leveraging a bi-directional domain adaptation framework with a dedicated TNet to enforce geometric constraints. The approach combines style translation between synthetic and realistic domains, dual depth networks, and a geometry-aware refinement step, followed by surfel-based reconstruction. Experimental results on synthetic and realistic data demonstrate superior depth accuracy and high-quality reconstructions, with ablations confirming the effectiveness of bi-directional adaptation and TNet. The method offers a practical pathway to visualize unobserved colon regions and potentially reduce misdiagnoses in clinical practice.

Abstract

Visualizing colonoscopy is crucial for medical auxiliary diagnosis to prevent undetected polyps in areas that are not fully observed. Traditional feature-based and depth-based reconstruction approaches usually end up with undesirable results due to incorrect point matching or imprecise depth estimation in realistic colonoscopy videos. Modern deep-based methods often require a sufficient number of ground truth samples, which are generally hard to obtain in optical colonoscopy. To address this issue, self-supervised and domain adaptation methods have been explored. However, these methods neglect geometry constraints and exhibit lower accuracy in predicting detailed depth. We thus propose a novel reconstruction pipeline with a bi-directional adaptation architecture named ToDER to get precise depth estimations. Furthermore, we carefully design a TNet module in our adaptation architecture to yield geometry constraints and obtain better depth quality. Estimated depth is finally utilized to reconstruct a reliable colon model for visualization. Experimental results demonstrate that our approach can precisely predict depth maps in both realistic and synthetic colonoscopy videos compared with other self-supervised and domain adaptation methods. Our method on realistic colonoscopy also shows the great potential for visualizing unobserved regions and preventing misdiagnoses.

ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation

TL;DR

ToDER tackles the challenge of depth estimation and 3D reconstruction in optical colonoscopy by leveraging a bi-directional domain adaptation framework with a dedicated TNet to enforce geometric constraints. The approach combines style translation between synthetic and realistic domains, dual depth networks, and a geometry-aware refinement step, followed by surfel-based reconstruction. Experimental results on synthetic and realistic data demonstrate superior depth accuracy and high-quality reconstructions, with ablations confirming the effectiveness of bi-directional adaptation and TNet. The method offers a practical pathway to visualize unobserved colon regions and potentially reduce misdiagnoses in clinical practice.

Abstract

Visualizing colonoscopy is crucial for medical auxiliary diagnosis to prevent undetected polyps in areas that are not fully observed. Traditional feature-based and depth-based reconstruction approaches usually end up with undesirable results due to incorrect point matching or imprecise depth estimation in realistic colonoscopy videos. Modern deep-based methods often require a sufficient number of ground truth samples, which are generally hard to obtain in optical colonoscopy. To address this issue, self-supervised and domain adaptation methods have been explored. However, these methods neglect geometry constraints and exhibit lower accuracy in predicting detailed depth. We thus propose a novel reconstruction pipeline with a bi-directional adaptation architecture named ToDER to get precise depth estimations. Furthermore, we carefully design a TNet module in our adaptation architecture to yield geometry constraints and obtain better depth quality. Estimated depth is finally utilized to reconstruct a reliable colon model for visualization. Experimental results demonstrate that our approach can precisely predict depth maps in both realistic and synthetic colonoscopy videos compared with other self-supervised and domain adaptation methods. Our method on realistic colonoscopy also shows the great potential for visualizing unobserved regions and preventing misdiagnoses.
Paper Structure (19 sections, 2 equations, 6 figures, 2 tables)

This paper contains 19 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Colon models reconstructed from two realistic colonoscopy videos. We compared our method with COLMAP schonberger2016structure. It shows that our method is able to reconstruct high-quality colon surface and clear unobserved area for reconstruction.
  • Figure 2: Pipeline of our method. The left part denotes the overview of ToDER for depth estimation. Starting from Src and Tgt (which indicating the source and target domain), the GANs are first initialized to transfer surface styles between two domains. DepthNets and TNet are next trained and fine-tuned to predict depths and improve the depth quality respectively. All the modules are supervised through the corresponding guidance denoted by the lines. The right part illustrates the reconstruction process using ToDER to get a reconstructed colon model from realistic or synthetic videos. Two depth maps predicted from the original target style and converted source style are next fused to infer the result depth. A 3D model is finally built by SurfelMeshing given pose from COLMAP and depth from ToDER. Note that ToDER does not need to label the depth on realistic colonoscopy.
  • Figure 3: Depth estimation results on realistic colonoscopy videos. Although the pose are unknown on realistic video, ToDER still produce reasonable depths with clear details while other methods suffer vague depth prediction on details. We show more depth results with compared methods in Fig. \ref{['fig:depth_more']}.
  • Figure 4: Qualitative results on synthetic videos. It shows that ToDER performs superior depth estimation compared with modern depth estimation methods.
  • Figure 5: The reconstructed colons from synthetic data. The results are obtained through depth maps from the ground truth, our ToDER, S2R-DepthNet, and Monodepth2. ToDER showcases superior texture fidelity and continuity(red arrows), along with reduced noise levels (blue arrows)
  • ...and 1 more figures