Table of Contents
Fetching ...

Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression Tasks

Ziliang Xiong, Arvi Jonnarth, Abdelrahman Eldesokey, Joakim Johnander, Bastian Wandt, Per-Erik Forssen

TL;DR

This work proposes hinge-Wasserstein – a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training, which enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available.

Abstract

Computer vision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by predicting probability distributions. In this context, we investigate the regression-by-classification paradigm which can represent multimodal distributions, without a prior assumption on the number of modes. Through experiments on a specifically designed synthetic dataset, we demonstrate that traditional loss functions lead to poor probability distribution estimates and severe overconfidence, in the absence of full ground truth distributions. In order to alleviate these issues, we propose hinge-Wasserstein -- a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training. This enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available. In extensive experiments, we show that the proposed loss leads to substantially better uncertainty estimation on two challenging computer vision tasks: horizon line detection and stereo disparity estimation.

Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression Tasks

TL;DR

This work proposes hinge-Wasserstein – a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training, which enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available.

Abstract

Computer vision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by predicting probability distributions. In this context, we investigate the regression-by-classification paradigm which can represent multimodal distributions, without a prior assumption on the number of modes. Through experiments on a specifically designed synthetic dataset, we demonstrate that traditional loss functions lead to poor probability distribution estimates and severe overconfidence, in the absence of full ground truth distributions. In order to alleviate these issues, we propose hinge-Wasserstein -- a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training. This enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available. In extensive experiments, we show that the proposed loss leads to substantially better uncertainty estimation on two challenging computer vision tasks: horizon line detection and stereo disparity estimation.
Paper Structure (19 sections, 14 equations, 7 figures, 7 tables)

This paper contains 19 sections, 14 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Edge pixels are subject to multimodal aleatoric uncertainty. Left: Three pixels are marked with red stars of the left frame. Right: Predicted disparity distributions of the three pixels, horizontal axis is disparity (same range for three subfigures), vertical axis is probability on (a) foreground (door of the car), (b) boundary pixel (edge of the mirror), and (c) background.
  • Figure 2: Horizon line detection should be framed as a probabilistic regression problem due to its inherently stochastic nature. Upper Left: Image where horizon line detection is easy (red line) and direct regression would work. Upper Middle and Right: Images where the horizon line is ambiguous. Bottom row: Plots below the images show the output probability distributions for the horizon line parameters $(\alpha,\rho)$, from the proposed method. Red: Gaussian-smoothed ground truth; Blue: predicted density. Images are from the HLW dataset workman2016hlw.
  • Figure 3: Example images from the synthetic dataset with controllable aleatoric uncertainty: (a) training set, one or two lines per image, one line in the annotation for unimodal training; both lines in the annotation for multimodal training; (b) test set 1, one line per image; and (c) test set 2, two lines per image.
  • Figure 4: Sparsification error curves for the HLW task (lower is better). (a) $\alpha$ entropy and absolute error as the oracle. (b) Same setting for $\rho$ entropy. See Table \ref{['tab:HLW']} for AUSE.
  • Figure 5: Density prediction for $\alpha$ with a model trained using hinge-$W_1$ with $\gamma_W=0.01$, and inference on both test sets. Note: Only unimodal ground truth was used during training. Blue shows Gaussian-smoothed ground truth, and orange shows predicted densities. (a) and (c) show examples of where two output peaks overlap the ground truth; (b) shows that the model cannot distinguish two peaks if they are too close; (d) shows the model working well with unimodal ground truth.
  • ...and 2 more figures