Table of Contents
Fetching ...

Dropout Inference in Bayesian Neural Networks with Alpha-divergences

Yingzhen Li, Yarin Gal

TL;DR

This paper develops a practical extension of BB-α alpha-divergence minimisation to dropout-based Bayesian neural networks, enabling simple loss-based inference without major architectural changes. By reparameterising the BB-α energy with cavity distributions, it derives a tractable MC objective that integrates seamlessly with dropout and standard neural network losses, controlled by the α parameter. The approach improves uncertainty estimates and predictive performance over dropout VI across regression and classification benchmarks, and demonstrates that epistemic uncertainty rises for adversarial inputs, offering a potential detection signal. It also analyzes runtime trade-offs, showing how increasing MC samples enhances calibration at computational cost, with results that are architecture- and dataset-dependent.

Abstract

To obtain uncertainty estimates with real-world Bayesian deep learning models, practical inference approximations are needed. Dropout variational inference (VI) for example has been used for machine vision and medical applications, but VI can severely underestimates model uncertainty. Alpha-divergences are alternative divergences to VI's KL objective, which are able to avoid VI's uncertainty underestimation. But these are hard to use in practice: existing techniques can only use Gaussian approximating distributions, and require existing models to be changed radically, thus are of limited use for practitioners. We propose a re-parametrisation of the alpha-divergence objectives, deriving a simple inference technique which, together with dropout, can be easily implemented with existing models by simply changing the loss of the model. We demonstrate improved uncertainty estimates and accuracy compared to VI in dropout networks. We study our model's epistemic uncertainty far away from the data using adversarial images, showing that these can be distinguished from non-adversarial images by examining our model's uncertainty.

Dropout Inference in Bayesian Neural Networks with Alpha-divergences

TL;DR

This paper develops a practical extension of BB-α alpha-divergence minimisation to dropout-based Bayesian neural networks, enabling simple loss-based inference without major architectural changes. By reparameterising the BB-α energy with cavity distributions, it derives a tractable MC objective that integrates seamlessly with dropout and standard neural network losses, controlled by the α parameter. The approach improves uncertainty estimates and predictive performance over dropout VI across regression and classification benchmarks, and demonstrates that epistemic uncertainty rises for adversarial inputs, offering a potential detection signal. It also analyzes runtime trade-offs, showing how increasing MC samples enhances calibration at computational cost, with results that are architecture- and dataset-dependent.

Abstract

To obtain uncertainty estimates with real-world Bayesian deep learning models, practical inference approximations are needed. Dropout variational inference (VI) for example has been used for machine vision and medical applications, but VI can severely underestimates model uncertainty. Alpha-divergences are alternative divergences to VI's KL objective, which are able to avoid VI's uncertainty underestimation. But these are hard to use in practice: existing techniques can only use Gaussian approximating distributions, and require existing models to be changed radically, thus are of limited use for practitioners. We propose a re-parametrisation of the alpha-divergence objectives, deriving a simple inference technique which, together with dropout, can be easily implemented with existing models by simply changing the loss of the model. We demonstrate improved uncertainty estimates and accuracy compared to VI in dropout networks. We study our model's epistemic uncertainty far away from the data using adversarial images, showing that these can be distinguished from non-adversarial images by examining our model's uncertainty.

Paper Structure

This paper contains 18 sections, 26 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Code snippet for our induced classification loss.
  • Figure 2: Negative test-LL results for Bayesian NN regression. The lower the better. Best viewed in colour.
  • Figure 3: Test RMSE results for Bayesian NN regression. The lower the better. Best viewed in colour.
  • Figure 4: MNIST test accuracy and test log likelihood for a fully connected NN in a classification task.
  • Figure 5: MNIST test accuracy and test log likelihood for a convolutional neural network in a classification task.
  • ...and 3 more figures