A Comparative Study on Multi-task Uncertainty Quantification in Semantic Segmentation and Monocular Depth Estimation

Steven Landgraf; Markus Hillemann; Theodor Kapler; Markus Ulrich

A Comparative Study on Multi-task Uncertainty Quantification in Semantic Segmentation and Monocular Depth Estimation

Steven Landgraf, Markus Hillemann, Theodor Kapler, Markus Ulrich

TL;DR

The paper addresses uncertainty quantification in a multi-task setting that jointly performs semantic segmentation and monocular depth estimation. It systematically compares Monte Carlo Dropout, Deep Sub-Ensembles, and Deep Ensembles using transformer-based baselines and a SegDepthFormer architecture, demonstrating that Deep Ensembles provide the best predictive performance and uncertainty quality, particularly under out-of-domain conditions. It shows that multi-task learning can improve uncertainty metrics for segmentation, and identifies median uncertainty as a robust default threshold, with threshold choice significantly impacting metrics but consistently across methods. The findings offer practical guidance for deploying multi-task uncertainty-aware perception systems in safety-critical applications, highlighting Deep Ensembles as the most reliable option and suggesting a ten-member ensemble as a reasonable balance of performance and cost. Overall, the work advances understanding of how multi-task uncertainty quantification behaves in modern architectures and under domain shifts, with clear implications for robust autonomous perception systems.

Abstract

Deep neural networks excel in perception tasks such as semantic segmentation and monocular depth estimation, making them indispensable in safety-critical applications like autonomous driving and industrial inspection. However, they often suffer from overconfidence and poor explainability, especially for out-of-domain data. While uncertainty quantification has emerged as a promising solution to these challenges, multi-task settings have yet to be explored. In an effort to shed light on this, we evaluate Monte Carlo Dropout, Deep Sub-Ensembles, and Deep Ensembles for joint semantic segmentation and monocular depth estimation. Thereby, we reveal that Deep Ensembles stand out as the preferred choice, particularly in out-of-domain scenarios, and show the potential benefit of multi-task learning with regard to the uncertainty quality in comparison to solving both tasks separately. Additionally, we highlight the impact of employing different uncertainty thresholds to classify pixels as certain or uncertain, with the median uncertainty emerging as a robust default.

A Comparative Study on Multi-task Uncertainty Quantification in Semantic Segmentation and Monocular Depth Estimation

TL;DR

Abstract

Paper Structure (14 sections, 8 equations, 3 figures, 6 tables)

This paper contains 14 sections, 8 equations, 3 figures, 6 tables.

Introduction
Related Work
Joint Semantic Segmentation and Monocular Depth Estimation
Uncertainty Quantification
Evaluation Strategy
Baseline Models
Uncertainty Quantification
Experimental Setup
Experiments
Quantitative Results
Impact of Ensemble Members
Impact of Uncertainty Threshold
Out-of-Domain Evaluation
Conclusion

Figures (3)

Figure 1: A schematic overview of the SegDepthFormer architecture. It combines the SegFormer xie2021segformer architecture with a lightweight all-MLP depth decoder.
Figure 2: Impact of the number of ensemble members on the predictive performance and uncertainty quality for a SegDepthFormer Deep Ensemble on the Cityscapes dataset cordts2016CityscapesDataset.
Figure 3: Out-of-Domain (OOD) uncertainty quality evaluation between the baseline SegDepthFormer and a SegDepthFormer DE with 10 members on the Rain$_3$ Cityscapes validation dataset hu2019depth. Rain$_3$ uses [0.03, 0.015, 0.002] for attenuation coefficients $\alpha$ and $\beta$ and the raindrop radius $a$. We compare the three uncertainty metrics $p(\text{accurate}|\text{certain})$, $p(\text{uncertain}|\text{inaccurate})$, and PAvPU for different uncertainty thresholds. Additionally, we report the area under curve (AUC).

A Comparative Study on Multi-task Uncertainty Quantification in Semantic Segmentation and Monocular Depth Estimation

TL;DR

Abstract

A Comparative Study on Multi-task Uncertainty Quantification in Semantic Segmentation and Monocular Depth Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)