Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling
David S. W. Williams, Matthew Gadd, Paul Newman, Daniele De Martini
TL;DR
The paper addresses the need for reliable runtime uncertainty estimates in semantic segmentation under distributional shift for safety-critical robotics. It introduces a three-stage framework—pretraining with general representations, task-specific supervised learning on a source domain, and uncertainty training on unlabelled target-domain data—built on Masked Image Modeling with a masking-based consistency objective, implemented via the masked consistency loss $L_c$ and masking mask $M_\gamma^{\phi}$. A core contribution is the Mask-d2 model, which uses unlabelled data to achieve high-quality, single-pass uncertainty estimates and outperforms OoD and uncertainty baselines on SAX targets while generalising to unseen domains like WildDash. This approach reduces runtime latency for safety-critical perception while providing calibrated uncertainty, enabling safer actuation and interaction with potentially unseen scenes.
Abstract
This work proposes a semantic segmentation network that produces high-quality uncertainty estimates in a single forward pass. We exploit general representations from foundation models and unlabelled datasets through a Masked Image Modeling (MIM) approach, which is robust to augmentation hyper-parameters and simpler than previous techniques. For neural networks used in safety-critical applications, bias in the training data can lead to errors; therefore it is crucial to understand a network's limitations at run time and act accordingly. To this end, we test our proposed method on a number of test domains including the SAX Segmentation benchmark, which includes labelled test data from dense urban, rural and off-road driving domains. The proposed method consistently outperforms uncertainty estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark.
