Endoscopic Depth Estimation Based on Deep Learning: A Survey
Ke Niu, Zeyun Liu, Xue Feng, Heng Li, Qika Lin, Kaize Shi
TL;DR
This survey addresses endoscopic depth estimation with deep learning, framing a comprehensive review around data, methods, and clinical applications. It catalogs dataset types (synthetic, phantom, surgical), delineates monocular and stereo DL approaches, and classifies supervision strategies (supervised, semi-supervised, self-supervised, domain adaptation). Key contributions include a structured taxonomy of methods, standard evaluation metrics, and a discussion of challenges and future directions, notably multimodal data fusion and foundation-model–driven knowledge integration. The findings highlight significant progress in real-time 3D reconstruction and navigation but also underscore data scarcity, generalization gaps, and the need for robust clinical validation and explainability to enable widespread translation.
Abstract
Endoscopic depth estimation is a critical technology for improving the safety and precision of minimally invasive surgery. It has attracted considerable attention from researchers in medical imaging, computer vision, and robotics. Over the past decade, a large number of methods have been developed. Despite the existence of several related surveys, a comprehensive overview focusing on recent deep learning-based techniques is still limited. This paper endeavors to bridge this gap by systematically reviewing the state-of-the-art literature. Specifically, we provide a thorough survey of the field from three key perspectives: data, methods, and applications. Firstly, at the data level, we describe the acquisition process of publicly available datasets. Secondly, at the methodological level, we introduce both monocular and stereo deep learning-based approaches for endoscopic depth estimation. Thirdly, at the application level, we identify the specific challenges and corresponding solutions for the clinical implementation of depth estimation technology, situated within concrete clinical scenarios. Finally, we outline potential directions for future research, such as domain adaptation, real-time implementation, and the synergistic fusion of depth information with sensor technologies, thereby providing a valuable starting point for researchers to engage with and advance the field toward clinical translation.
