Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding
Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla
TL;DR
Bayesian SegNet extends SegNet to produce probabilistic pixel-wise segmentation by using dropout at test time to sample from the posterior over network weights. It provides per-pixel uncertainty estimates via sample variance and demonstrates that uncertainty-aware predictions yield better segmentation, especially on small datasets, and generalizes to other architectures like FCN and Dilation Network. The approach achieves 2-3% improvement across multiple state-of-the-art models on CamVid, SUN RGB-D, and Pascal VOC, while maintaining a relatively compact parameter count and enabling near real-time inference with parallel Monte Carlo sampling. The work highlights the value of uncertainty in semantic segmentation for decision-making and downstream learning tasks.
Abstract
We present a deep learning framework for probabilistic pixel-wise semantic segmentation, which we term Bayesian SegNet. Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making. Our contribution is a practical system which is able to predict pixel-wise class labels with a measure of model uncertainty. We achieve this by Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels. In addition, we show that modelling uncertainty improves segmentation performance by 2-3% across a number of state of the art architectures such as SegNet, FCN and Dilation Network, with no additional parametrisation. We also observe a significant improvement in performance for smaller datasets where modelling uncertainty is more effective. We benchmark Bayesian SegNet on the indoor SUN Scene Understanding and outdoor CamVid driving scenes datasets.
