Table of Contents
Fetching ...

Evaluating Bayesian Deep Learning Methods for Semantic Segmentation

Jishnu Mukhoti, Yarin Gal

TL;DR

The paper tackles the challenge of evaluating uncertainty in Bayesian deep learning for semantic segmentation. It proposes three specialized metrics and implements two Bayesian DeepLab-v3+ variants using MC dropout and Concrete dropout, evaluated on Cityscapes. Concrete dropout consistently outperforms MC dropout on the new metrics, and both Bayesian models exceed the deterministic baseline in uncertainty-aware performance. These results establish benchmarks for uncertainty quantification in safety-critical segmentation and motivate future work on downstream autonomous driving decisions.

Abstract

Deep learning has been revolutionary for computer vision and semantic segmentation in particular, with Bayesian Deep Learning (BDL) used to obtain uncertainty maps from deep models when predicting semantic classes. This information is critical when using semantic segmentation for autonomous driving for example. Standard semantic segmentation systems have well-established evaluation metrics. However, with BDL's rising popularity in computer vision we require new metrics to evaluate whether a BDL method produces better uncertainty estimates than another method. In this work we propose three such metrics to evaluate BDL models designed specifically for the task of semantic segmentation. We modify DeepLab-v3+, one of the state-of-the-art deep neural networks, and create its Bayesian counterpart using MC dropout and Concrete dropout as inference techniques. We then compare and test these two inference techniques on the well-known Cityscapes dataset using our suggested metrics. Our results provide new benchmarks for researchers to compare and evaluate their improved uncertainty quantification in pursuit of safer semantic segmentation.

Evaluating Bayesian Deep Learning Methods for Semantic Segmentation

TL;DR

The paper tackles the challenge of evaluating uncertainty in Bayesian deep learning for semantic segmentation. It proposes three specialized metrics and implements two Bayesian DeepLab-v3+ variants using MC dropout and Concrete dropout, evaluated on Cityscapes. Concrete dropout consistently outperforms MC dropout on the new metrics, and both Bayesian models exceed the deterministic baseline in uncertainty-aware performance. These results establish benchmarks for uncertainty quantification in safety-critical segmentation and motivate future work on downstream autonomous driving decisions.

Abstract

Deep learning has been revolutionary for computer vision and semantic segmentation in particular, with Bayesian Deep Learning (BDL) used to obtain uncertainty maps from deep models when predicting semantic classes. This information is critical when using semantic segmentation for autonomous driving for example. Standard semantic segmentation systems have well-established evaluation metrics. However, with BDL's rising popularity in computer vision we require new metrics to evaluate whether a BDL method produces better uncertainty estimates than another method. In this work we propose three such metrics to evaluate BDL models designed specifically for the task of semantic segmentation. We modify DeepLab-v3+, one of the state-of-the-art deep neural networks, and create its Bayesian counterpart using MC dropout and Concrete dropout as inference techniques. We then compare and test these two inference techniques on the well-known Cityscapes dataset using our suggested metrics. Our results provide new benchmarks for researchers to compare and evaluate their improved uncertainty quantification in pursuit of safer semantic segmentation.

Paper Structure

This paper contains 17 sections, 7 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: High level overview of the system proposed in this work. The input is first passed through a Bayesian neural network which produces pixel-wise predictions as well as pixel-wise uncertainty estimates. The ground truth labels, predictions and uncertainties are then sent to the performance evaluation module which returns the values of the metrics designed for evaluating the model.
  • Figure 2: Worked out example of computing the performance evaluation metrics for Bayesian models in semantic segmentation.
  • Figure 3: Plots of p(accurate|certain) and p(uncertain|inaccurate) for varying thresholds of uncertainty.
  • Figure 4: Plot of PAvPU for varying thresholds of uncertainty.
  • Figure 5: Qualitative results for semantic segmentation with uncertainty estimates on Cityscapes images. The results include images from the Cityscapes val set, the corresponding semantic segmentation results from our model and the predictive and epistemic uncertainties estimated through the predictive entropy and the mutual information metrics respectively. Darker shades indicate higher uncertainty.
  • ...and 2 more figures