Table of Contents
Fetching ...

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla

TL;DR

Bayesian SegNet extends SegNet to produce probabilistic pixel-wise segmentation by using dropout at test time to sample from the posterior over network weights. It provides per-pixel uncertainty estimates via sample variance and demonstrates that uncertainty-aware predictions yield better segmentation, especially on small datasets, and generalizes to other architectures like FCN and Dilation Network. The approach achieves 2-3% improvement across multiple state-of-the-art models on CamVid, SUN RGB-D, and Pascal VOC, while maintaining a relatively compact parameter count and enabling near real-time inference with parallel Monte Carlo sampling. The work highlights the value of uncertainty in semantic segmentation for decision-making and downstream learning tasks.

Abstract

We present a deep learning framework for probabilistic pixel-wise semantic segmentation, which we term Bayesian SegNet. Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making. Our contribution is a practical system which is able to predict pixel-wise class labels with a measure of model uncertainty. We achieve this by Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels. In addition, we show that modelling uncertainty improves segmentation performance by 2-3% across a number of state of the art architectures such as SegNet, FCN and Dilation Network, with no additional parametrisation. We also observe a significant improvement in performance for smaller datasets where modelling uncertainty is more effective. We benchmark Bayesian SegNet on the indoor SUN Scene Understanding and outdoor CamVid driving scenes datasets.

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

TL;DR

Bayesian SegNet extends SegNet to produce probabilistic pixel-wise segmentation by using dropout at test time to sample from the posterior over network weights. It provides per-pixel uncertainty estimates via sample variance and demonstrates that uncertainty-aware predictions yield better segmentation, especially on small datasets, and generalizes to other architectures like FCN and Dilation Network. The approach achieves 2-3% improvement across multiple state-of-the-art models on CamVid, SUN RGB-D, and Pascal VOC, while maintaining a relatively compact parameter count and enabling near real-time inference with parallel Monte Carlo sampling. The work highlights the value of uncertainty in semantic segmentation for decision-making and downstream learning tasks.

Abstract

We present a deep learning framework for probabilistic pixel-wise semantic segmentation, which we term Bayesian SegNet. Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making. Our contribution is a practical system which is able to predict pixel-wise class labels with a measure of model uncertainty. We achieve this by Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels. In addition, we show that modelling uncertainty improves segmentation performance by 2-3% across a number of state of the art architectures such as SegNet, FCN and Dilation Network, with no additional parametrisation. We also observe a significant improvement in performance for smaller datasets where modelling uncertainty is more effective. We benchmark Bayesian SegNet on the indoor SUN Scene Understanding and outdoor CamVid driving scenes datasets.

Paper Structure

This paper contains 16 sections, 3 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Bayesian SegNet. These examples show the performance of Bayesian SegNet on popular segmentation and scene understanding benchmarks: SUN song2015sun (left), CamVid brostow2009semantic (center column) and Pascal VOC everingham2010pascal (right). The system takes an RGB image as input (top), and outputs a semantic segmentation (middle row) and model uncertainty estimate, averaged across all classes (bottom row). We observe higher model uncertainty at object boundaries and with visually difficult objects. An online demo and source code can be found on our project webpage mi.eng.cam.ac.uk/projects/segnet/
  • Figure 2: A schematic of the Bayesian SegNet architecture. This diagram shows the entire pipeline for the system which is trained end-to-end in one step with stochastic gradient descent. The encoders are based on the 13 convolutional layers of the VGG-16 network simonyan2014very, with the decoder placing them in reverse. The probabilistic output is obtained from Monte Carlo samples of the model with dropout at test time. We take the variance of these softmax samples as the model uncertainty for each class.
  • Figure 3: Comparison of uncertainty with Monte Carlo dropout and uncertainty from softmax regression (c-e: darker colour represents larger value). This figure shows that softmax regression is only capable of inferring relative probabilities between classes. In contrast, dropout uncertainty can produce an estimate of absolute model uncertainty.
  • Figure 4: Global segmentation accuracy against number of Monte Carlo samples for both SegNet and SegNet-Basic. Results averaged over 5 trials, with two standard deviation error bars, are shown for the CamVid dataset. This shows that Monte Carlo sampling outperforms the weight averaging technique after approximately 6 samples. Monte Carlo sampling converges after around 40 samples with no further significant improvement beyond this point.
  • Figure 5: Bayesian SegNet results on CamVid road scene understanding dataset brostow2009semantic. The top row is the input image, with the ground truth shown in the second row. The third row shows Bayesian SegNet's segmentation prediction, with overall model uncertainty, averaged across all classes, in the bottom row (with darker colours indicating more uncertain predictions). In general, we observe high quality segmentation, especially on more difficult classes such as poles, people and cyclists. Where SegNet produces an incorrect class label we often observe a high model uncertainty.
  • ...and 3 more figures