A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation

Steven Landgraf; Rongjun Qin; Markus Ulrich

A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation

Steven Landgraf, Rongjun Qin, Markus Ulrich

TL;DR

This work investigates how to make metric monocular depth estimation safer and more trustworthy by fusing five uncertainty quantification methods with the DepthAnythingV2 foundation model. It demonstrates that Gaussian Negative Log-Likelihood (GNLL) offers reliable per-pixel uncertainty estimates while preserving depth accuracy and computational efficiency, outperforming other UQ approaches across diverse datasets. The study emphasizes the importance of uncertainty-aware depth estimation for real-world deployment and discusses qualitative insights into where and why these uncertainties arise. By highlighting GNLL's strong performance and scalability, the paper lays groundwork for broader integration of uncertainty into foundation-model-driven perception, with potential extensions to semantic segmentation and pose estimation.

Abstract

While recent foundation models have enabled significant breakthroughs in monocular depth estimation, a clear path towards safe and reliable deployment in the real-world remains elusive. Metric depth estimation, which involves predicting absolute distances, poses particular challenges, as even the most advanced foundation models remain prone to critical errors. Since quantifying the uncertainty has emerged as a promising endeavor to address these limitations and enable trustworthy deployment, we fuse five different uncertainty quantification methods with the current state-of-the-art DepthAnythingV2 foundation model. To cover a wide range of metric depth domains, we evaluate their performance on four diverse datasets. Our findings identify fine-tuning with the Gaussian Negative Log-Likelihood Loss (GNLL) as a particularly promising approach, offering reliable uncertainty estimates while maintaining predictive performance and computational efficiency on par with the baseline, encompassing both training and inference time. By fusing uncertainty quantification and foundation models within the context of monocular depth estimation, this paper lays a critical foundation for future research aimed at improving not only model performance but also its explainability. Extending this critical synthesis of uncertainty quantification and foundation models into other crucial tasks, such as semantic segmentation and pose estimation, presents exciting opportunities for safer and more reliable machine vision systems.

A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation

TL;DR

Abstract

Paper Structure (17 sections, 7 equations, 8 figures, 6 tables)

This paper contains 17 sections, 7 equations, 8 figures, 6 tables.

Introduction
Related Work
Monocular Depth Estimation
Uncertainty Quantification
DepthAnything Foundation Model
Methodology
Overview
Learned Confidence
Gaussian Negative Log-Likelihood Loss
MC Dropout
Sub-Ensemble
Test-Time Augmentation
Experiments
Experimental Setup
Quantitative Evaluation
...and 2 more sections

Figures (8)

Figure 1: Qualitative example of a fine-tuned DepthAnythingV2 for metric monocular depth estimation on the NYUv2 dataset silberman2012indoor, using a ViT-S encoder and Monte Carlo Dropout for an additional uncertainty estimate. The binary accuracy map is based on the $\delta_1$ error. The strong correlation between erroneous predictions and high uncertainties highlights the potential of integrating uncertainty quantification (UQ) methods with foundation models for MDE.
Figure 2: A schematic overview of how to fuse the five different uncertainty quantification approaches with the DepthAnythingV2 yang2024depth_2 foundation model.
Figure : Input Image
Figure : Input Image
Figure : Ground Truth
...and 3 more figures

A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation

TL;DR

Abstract

A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)