Robust Monocular Depth Estimation under Challenging Conditions

Stefano Gasperini; Nils Morbitzer; HyunJun Jung; Nassir Navab; Federico Tombari

Robust Monocular Depth Estimation under Challenging Conditions

Stefano Gasperini, Nils Morbitzer, HyunJun Jung, Nassir Navab, Federico Tombari

TL;DR

This paper tackles the problem of robust monocular depth estimation under challenging illumination and weather, where conventional self-supervised and supervised methods falter. It introduces md4all, a simple training-time strategy that leverages image-to-image translations from easy day-like conditions to adverse ones, while computing losses on the original easy samples to preserve reliable signals. The authors present two self-supervised variants—Always Daytime (AD) and Day Distillation (DD)—and extend the approach to supervised learning, showing consistent improvements across nuScenes and Oxford RobotCar datasets in night, rain, and standard conditions, with no inference-time changes to the model. The approach yields substantial performance gains over state-of-the-art baselines, demonstrates strong qualitative improvements, and provides open-source translations to facilitate further research and deployment in safety-critical applications.

Abstract

While state-of-the-art monocular depth estimation approaches achieve impressive results in ideal settings, they are highly unreliable under challenging illumination and weather conditions, such as at nighttime or in the presence of rain. In this paper, we uncover these safety-critical issues and tackle them with md4all: a simple and effective solution that works reliably under both adverse and ideal conditions, as well as for different types of learning supervision. We achieve this by exploiting the efficacy of existing methods under perfect settings. Therefore, we provide valid training signals independently of what is in the input. First, we generate a set of complex samples corresponding to the normal training ones. Then, we train the model by guiding its self- or full-supervision by feeding the generated samples and computing the standard losses on the corresponding original images. Doing so enables a single model to recover information across diverse conditions without modifications at inference time. Extensive experiments on two challenging public datasets, namely nuScenes and Oxford RobotCar, demonstrate the effectiveness of our techniques, outperforming prior works by a large margin in both standard and challenging conditions. Source code and data are available at: https://md4all.github.io.

Robust Monocular Depth Estimation under Challenging Conditions

TL;DR

Abstract

Paper Structure (41 sections, 6 equations, 18 figures, 17 tables)

This paper contains 41 sections, 6 equations, 18 figures, 17 tables.

Introduction
Related Work
Supervised Monocular Depth Estimation
Self-Supervised Monocular Depth Estimation
Solutions to Inherent Issues
Method
md4all - Self-Supervised
Self-Supervised Baseline
md4all-AD: Always Daytime, No Bad Weather
md4all-DD: Day Distillation
md4all - Supervised
Experiments and Results
Experimental Setup
Quantitative Results
Qualitative Results
...and 26 more sections

Figures (18)

Figure 1: Predictions in challenging settings caesar2020nuscenes for self-supervised godard2019monodepth2 and supervised bhat2021adabins methods. Standard approaches fail due to training assumptions or sensor artifacts. Under both supervisions, our md4all makes the same models robust in all conditions.
Figure 2: Detrimental factors to monocular depth estimation in difficult settings from nuScenes caesar2020nuscenes. Self-supervised works have issues with textureless areas, reflections, and noise. Supervised ones learn artifacts from the ground truth sensor (LiDAR is shown).
Figure 3: Our md4all-DD framework. The frozen day - depth model estimates on easy samples and provides guidance to another model fed with a mix of easy and translated inputs. Inference is done with a simple single model for both fully- and self-supervised md4all.
Figure 4: Our self-supervised md4all-AD framework. With $x=0$, it is equivalent to the day - depth model in Figure \ref{['fig:framework-dd']} and the baseline. The depth model is trained with a mix of easy and translated samples, while the training signal is always from the easy ones.
Figure 5: Comparison on nuScenes caesar2020nuscenes between fully-sup. AdaBins bhat2021adabins w/o and w/ ours, and self-sup. Monodepth2 godard2019monodepth2 w/o and w/ ours.
...and 13 more figures

Robust Monocular Depth Estimation under Challenging Conditions

TL;DR

Abstract

Robust Monocular Depth Estimation under Challenging Conditions

Authors

TL;DR

Abstract

Table of Contents

Figures (18)