A Comparative Study on Multi-task Uncertainty Quantification in Semantic Segmentation and Monocular Depth Estimation
Steven Landgraf, Markus Hillemann, Theodor Kapler, Markus Ulrich
TL;DR
The paper addresses uncertainty quantification in a multi-task setting that jointly performs semantic segmentation and monocular depth estimation. It systematically compares Monte Carlo Dropout, Deep Sub-Ensembles, and Deep Ensembles using transformer-based baselines and a SegDepthFormer architecture, demonstrating that Deep Ensembles provide the best predictive performance and uncertainty quality, particularly under out-of-domain conditions. It shows that multi-task learning can improve uncertainty metrics for segmentation, and identifies median uncertainty as a robust default threshold, with threshold choice significantly impacting metrics but consistently across methods. The findings offer practical guidance for deploying multi-task uncertainty-aware perception systems in safety-critical applications, highlighting Deep Ensembles as the most reliable option and suggesting a ten-member ensemble as a reasonable balance of performance and cost. Overall, the work advances understanding of how multi-task uncertainty quantification behaves in modern architectures and under domain shifts, with clear implications for robust autonomous perception systems.
Abstract
Deep neural networks excel in perception tasks such as semantic segmentation and monocular depth estimation, making them indispensable in safety-critical applications like autonomous driving and industrial inspection. However, they often suffer from overconfidence and poor explainability, especially for out-of-domain data. While uncertainty quantification has emerged as a promising solution to these challenges, multi-task settings have yet to be explored. In an effort to shed light on this, we evaluate Monte Carlo Dropout, Deep Sub-Ensembles, and Deep Ensembles for joint semantic segmentation and monocular depth estimation. Thereby, we reveal that Deep Ensembles stand out as the preferred choice, particularly in out-of-domain scenarios, and show the potential benefit of multi-task learning with regard to the uncertainty quality in comparison to solving both tasks separately. Additionally, we highlight the impact of employing different uncertainty thresholds to classify pixels as certain or uncertain, with the median uncertainty emerging as a robust default.
