Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
Alex Kendall, Yarin Gal, Roberto Cipolla
TL;DR
The paper addresses the challenge of balancing losses in multi-task learning for scene understanding by introducing a probabilistic weighting scheme based on homoscedastic (task-dependent) uncertainty. It derives a joint likelihood for regression and classification tasks, enabling the model to learn per-task weights via learnable noise parameters and a stable log-variance representation. The approach is implemented in a unified encoder-decoder architecture that performs semantic, instance, and depth predictions from monocular images, achieving superior performance over single-task models and naïve loss weighting. The findings highlight dynamic, data-driven task weighting during training and demonstrate practical benefits for real-time, multi-task scene understanding systems.
Abstract
Numerous deep learning applications benefit from multi-task learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. We propose a principled approach to multi-task deep learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. This allows us to simultaneously learn various quantities with different units or scales in both classification and regression settings. We demonstrate our model learning per-pixel depth regression, semantic and instance segmentation from a monocular input image. Perhaps surprisingly, we show our model can learn multi-task weightings and outperform separate models trained individually on each task.
