Table of Contents
Fetching ...

Confidence-aware Monocular Depth Estimation for Minimally Invasive Surgery

Muhammad Asad, Emanuele Colleoni, Pritesh Mehta, Nicolas Toussaint, Ricardo Sanchez-Matilla, Maria Robu, Faisal Bashir, Rahim Mohammadi, Imanol Luengo, Danail Stoyanov

TL;DR

A confidence-aware MDE framework enables improved accuracy of MDE models in MIS, addressing challenges posed by noise and artifacts in pre-clinical and clinical data, and allows MDE models to provide confidence maps that may be used to improve their reliability for clinical applications.

Abstract

Purpose: Monocular depth estimation (MDE) is vital for scene understanding in minimally invasive surgery (MIS). However, endoscopic video sequences are often contaminated by smoke, specular reflections, blur, and occlusions, limiting the accuracy of MDE models. In addition, current MDE models do not output depth confidence, which could be a valuable tool for improving their clinical reliability. Methods: We propose a novel confidence-aware MDE framework featuring three significant contributions: (i) Calibrated confidence targets: an ensemble of fine-tuned stereo matching models is used to capture disparity variance into pixel-wise confidence probabilities; (ii) Confidence-aware loss: Baseline MDE models are optimized with confidence-aware loss functions, utilizing pixel-wise confidence probabilities such that reliable pixels dominate training; and (iii) Inference-time confidence: a confidence estimation head is proposed with two convolution layers to predict per-pixel confidence at inference, enabling assessment of depth reliability. Results: Comprehensive experimental validation across internal and public datasets demonstrates that our framework improves depth estimation accuracy and can robustly quantify the prediction's confidence. On the internal clinical endoscopic dataset (StereoKP), we improve dense depth estimation accuracy by ~8% as compared to the baseline model. Conclusion: Our confidence-aware framework enables improved accuracy of MDE models in MIS, addressing challenges posed by noise and artifacts in pre-clinical and clinical data, and allows MDE models to provide confidence maps that may be used to improve their reliability for clinical applications.

Confidence-aware Monocular Depth Estimation for Minimally Invasive Surgery

TL;DR

A confidence-aware MDE framework enables improved accuracy of MDE models in MIS, addressing challenges posed by noise and artifacts in pre-clinical and clinical data, and allows MDE models to provide confidence maps that may be used to improve their reliability for clinical applications.

Abstract

Purpose: Monocular depth estimation (MDE) is vital for scene understanding in minimally invasive surgery (MIS). However, endoscopic video sequences are often contaminated by smoke, specular reflections, blur, and occlusions, limiting the accuracy of MDE models. In addition, current MDE models do not output depth confidence, which could be a valuable tool for improving their clinical reliability. Methods: We propose a novel confidence-aware MDE framework featuring three significant contributions: (i) Calibrated confidence targets: an ensemble of fine-tuned stereo matching models is used to capture disparity variance into pixel-wise confidence probabilities; (ii) Confidence-aware loss: Baseline MDE models are optimized with confidence-aware loss functions, utilizing pixel-wise confidence probabilities such that reliable pixels dominate training; and (iii) Inference-time confidence: a confidence estimation head is proposed with two convolution layers to predict per-pixel confidence at inference, enabling assessment of depth reliability. Results: Comprehensive experimental validation across internal and public datasets demonstrates that our framework improves depth estimation accuracy and can robustly quantify the prediction's confidence. On the internal clinical endoscopic dataset (StereoKP), we improve dense depth estimation accuracy by ~8% as compared to the baseline model. Conclusion: Our confidence-aware framework enables improved accuracy of MDE models in MIS, addressing challenges posed by noise and artifacts in pre-clinical and clinical data, and allows MDE models to provide confidence maps that may be used to improve their reliability for clinical applications.
Paper Structure (15 sections, 3 equations, 4 figures, 4 tables)

This paper contains 15 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Factors such as image acquisition noise, lens contamination/smudges, blur, occluded views, and lighting issues in MIS datasets contribute towards unreliable depth data that impacts MDE models trained on MIS datasets. Depth confidence can be used to identify noise-free regions in confidence-aware training. Examples from the StereoKP dataset.
  • Figure 2: The proposed confidence-aware training methods. Our proposed blocks are shown in blue. Method (a) is used to add depth confidence to a depth dataset, whereas (b) shows our confidence-aware MDE training framework.
  • Figure 3: Comparison of frames from different datasets used in this work. StereoKP, Hamlyn recasens2021endo and DaVinci ye2017self are from clinical settings whereas both MicroCT-SE and MicroCT-PK are from lab settings. StereoKP and MicroCT datasets are internal wheres Hamlyn and DaVinci are public datasets.
  • Figure 4: Qualitative results on StereoKP dataset, shows ground truth and predicted depth from DAv1-B baseline and DAv1-B CA (proposed) models, additionally showing predicted confidence from CA model. Arrows show regions where CA model performed particularly better as compared to baseline.