Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

Ruofeng Wei; Bin Li; Kai Chen; Yiyao Ma; Yunhui Liu; Qi Dou

Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

Ruofeng Wei, Bin Li, Kai Chen, Yiyao Ma, Yunhui Liu, Qi Dou

TL;DR

This work tackles the problem of obtaining absolute scale in monocular endoscopic depth estimation, where prior methods struggle to infer real-world scale from single images. It introduces an enhanced scale-aware framework that combines multi-resolution depth fusion with geometry-based scale recovery using cylindrical instrument modeling and image-based primitives. Scale is recovered by aligning 3D instrument poses with the relative depth maps through a simple least-squares fit to produce absolute depth; this yields depth maps with accurate scale and sharper boundaries. Experiments on in-house surgical videos and simulator data show state-of-the-art scale accuracy, improved boundary details, and real-time-like performance, suggesting strong potential for practical robotic endoscopy and navigation.

Abstract

Scale-aware monocular depth estimation poses a significant challenge in computer-aided endoscopic navigation. However, existing depth estimation methods that do not consider the geometric priors struggle to learn the absolute scale from training with monocular endoscopic sequences. Additionally, conventional methods face difficulties in accurately estimating details on tissue and instruments boundaries. In this paper, we tackle these problems by proposing a novel enhanced scale-aware framework that only uses monocular images with geometric modeling for depth estimation. Specifically, we first propose a multi-resolution depth fusion strategy to enhance the quality of monocular depth estimation. To recover the precise scale between relative depth and real-world values, we further calculate the 3D poses of instruments in the endoscopic scenes by algebraic geometry based on the image-only geometric primitives (i.e., boundaries and tip of instruments). Afterwards, the 3D poses of surgical instruments enable the scale recovery of relative depth maps. By coupling scale factors and relative depth estimation, the scale-aware depth of the monocular endoscopic scenes can be estimated. We evaluate the pipeline on in-house endoscopic surgery videos and simulated data. The results demonstrate that our method can learn the absolute scale with geometric modeling and accurately estimate scale-aware depth for monocular scenes.

Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

TL;DR

Abstract

Paper Structure (11 sections, 5 equations, 3 figures, 3 tables)

This paper contains 11 sections, 5 equations, 3 figures, 3 tables.

Introduction
Method
Overview of the Scale-aware Depth Estimation Framework
Enhancing Monocular Depth via Multi-Resolution Fusion
Pose Estimation of Instruments with Geometric Modeling
Scale Recovery by 3D Poses
Experiments
Results
Conclusion
Acknowledgments.
Disclosure of Interests.

Figures (3)

Figure 1: Overview of our proposed scale-aware monocular depth estimation framework, which consists of modules for relative depth estimation, surgical instrument pose estimation with geometric modeling, and scale recovery.
Figure 2: Qualitative comparisons on in-house data. Our method outperforms EndoSfM(EndoS) ozyoruk2021endoslam, AF-SfMLearner(AF) shao2022self, ManyDepth(ManyD) watson2021temporal, Depth Anything(DepthA) yang2024depth, DPT ranftl2021vision, and MonoDepth Stereo(Stereo) recasens2021endo in depth quality.
Figure 3: Qualitative comparison of 3D pose estimation. Green cylinders represent the rendered calculated poses of surgical tools.

Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

TL;DR

Abstract

Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (3)