Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data

David S. W. Williams; Daniele De Martini; Matthew Gadd; Paul Newman

Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data

David S. W. Williams, Daniele De Martini, Matthew Gadd, Paul Newman

TL;DR

This article presents a segmentation network that can detect errors caused by challenging test domains without any additional annotation in a single forward pass, and consistently outperforms uncertainty estimation and out-of-distribution techniques on this difficult benchmark.

Abstract

Knowing when a trained segmentation model is encountering data that is different to its training data is important. Understanding and mitigating the effects of this play an important part in their application from a performance and assurance perspective - this being a safety concern in applications such as autonomous vehicles (AVs). This work presents a segmentation network that can detect errors caused by challenging test domains without any additional annotation in a single forward pass. As annotation costs limit the diversity of labelled datasets, we use easy-to-obtain, uncurated and unlabelled data to learn to perform uncertainty estimation by selectively enforcing consistency over data augmentation. To this end, a novel segmentation benchmark based on the SAX Dataset is used, which includes labelled test data spanning three autonomous-driving domains, ranging in appearance from dense urban to off-road. The proposed method, named Gamma-SSL, consistently outperforms uncertainty estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark - by up to 10.7% in area under the receiver operating characteristic (ROC) curve and 19.2% in area under the precision-recall (PR) curve in the most challenging of the three scenarios.

Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data

TL;DR

Abstract

Paper Structure (51 sections, 20 equations, 9 figures, 14 tables, 1 algorithm)

This paper contains 51 sections, 20 equations, 9 figures, 14 tables, 1 algorithm.

Introduction
Preliminaries on Uncertainty Estimation
Related Work
Epistemic Uncertainty Estimation
Deterministic Uncertainty Methods
Aleatoric Uncertainty Estimation
Out-of-Distribution Detection
Semi-Supervised Learning
System Overview
Segmentation Using Prototypes
Uncertainty Estimation using $\gamma$
Training objectives
Semi-Supervised Task
Calculating $\gamma$: Making inconsistent pixels uncertain
Learning $\texttt{E}$: Making certain pixels consistent
...and 36 more sections

Figures (9)

Figure 1: In an image from the SAX project sax (top), a horse (an object of unknown class) can be seen on the road. In the central image, this horse is poorly segmented leading to a dangerous driving situation. However, the model proposed in this work expresses pixel-wise uncertainty (blacked pixels on the bottom image), thereby mitigating the poor segmentation and the dangerous situation more generally. Uncertainty is also expressed over unfamiliar greenery that the model struggles to consistently segment as either $\mathrm{vegetation}$ or $\mathrm{terrain}$ (classes defined in Cityscapes).
Figure 2: Depiction of simultaneous segmentation and uncertainty estimation for the model presented in this work. Pixel-wise features are extracted from an image by encoder E. Distances $\texttt{d}_{1:3}$ are calculated between each feature and prototypical features from each class $\texttt{p}_{1:3}$, known as prototypes. If one of $[\texttt{d}_1, \texttt{d}_2, \texttt{d}_3] < \gamma$, the feature is $\mathrm{certain}$ and assigned the class of its closest prototype (denoted by the coloured pixel overlaid on the right), and if not, the feature is assigned $\mathrm{uncertain}$ (denoted as the question mark in white pixel). In this way, a 'safe region of operation' is defined in feature space, where inside pixels are accurate and certain, and outside they are uncertain and inaccurate.
Figure 3: The training regime of the proposed approach. The model parameters are updated by four losses: (a) $L_{\text{c}}$ (b) $L_{\text{u}}$ (c) $L_{\text{p}}$ (d) $L_{\text{s}}$. (a) For the pixels deemed certain by $M_\gamma$, $L_{\text{c}}$ maximises the consistency -- a proxy for accuracy -- over the segmentations $s'_{T}$, $s_{T}$ of augmented versions $\bar{x}'_T,\bar{x}_T$ of the original target domain image $x_{T}$. (b) $L_{\text{u}}$ softly constrains the features $z_{T}$ to be uniformly distributed on the unit-hypersphere. (c) $L_{\text{p}}$ maximises the distance between source prototypes $p_S$, i.e. spreads the mean embeddings of each class in the source domain dataset over the unit-sphere uniformly. (d) $L_{\text{s}}$ maximises the accuracy for the segmentations of the source images $x_S$ with respect to ground-truth labels. For each diagram, the networks coloured in aquamarine are updated by the losses, while the cross-hatched networks are not. Note that for diagrammatic clarity, the colour transforms are depicted as following $\bar{x}'_T,\bar{x}_T, \bar{x}_S$, whereas in reality and as described in \ref{['subsec:ssl_task']}: $\bar{x}'_T = \pazocal{C}_1 \circ \pazocal{T}^{\pazocal{L}}_{1} \circ \pazocal{T^{G}}(x_{T})$, $\bar{x}_T = \pazocal{C}_2 \circ \pazocal{T}^{\pazocal{L}}_{2} \circ \pazocal{T}_1^{\pazocal{G}}(x_{T})$, $\bar{x}_S = \pazocal{C}_3 \circ \pazocal{T}^{\pazocal{L}}_{3} \circ \pazocal{T}_2^{\pazocal{G}}(x_{S})$. Best viewed in color.
Figure 4: Example images and relative segmentation masks from Cityscapes -- the source domain -- and the domains in the SAX Segmentation Test Dataset.
Figure 5: For each SAX domain, a row of plots describes the misclassification detection performance of a series of benchmarks and the proposed methods, $\mathrm{\gamma}\text{-}\mathrm{SSL}$ and $\mathrm{\gamma}\text{-}\mathrm{SSL_{iL}}$. Misclassification detection accuracy, $\mathrm{A_{MD}}$, and F-score, $\mathrm{F_{0.5}}$, aggregate performance into a single metric, where a larger value of each represents a more 'introspective' model. They are plotted versus $\mathrm{p(a,c)}$, the proportion of pixels that are $\mathrm{accurate}$ and $\mathrm{certain}$, as this represents the amount of accurate and useful semantic information the model can extract from images; also a metric maximised by the ideal model. Note that the maximum value of $\mathrm{p(a,c)}$ is equal to the segmentation accuracy, $\mathrm{max}\mathrm{[p(a, c)]} = \mathrm{p(accurate)}$. Best viewed in color.
...and 4 more figures

Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data

TL;DR

Abstract

Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data

Authors

TL;DR

Abstract

Table of Contents

Figures (9)