Table of Contents
Fetching ...

Bayesian Neural Networks: An Introduction and Survey

Ethan Goan, Clinton Fookes

Abstract

Neural Networks (NNs) have provided state-of-the-art results for many challenging machine learning tasks such as detection, regression and classification across the domains of computer vision, speech recognition and natural language processing. Despite their success, they are often implemented in a frequentist scheme, meaning they are unable to reason about uncertainty in their predictions. This article introduces Bayesian Neural Networks (BNNs) and the seminal research regarding their implementation. Different approximate inference methods are compared, and used to highlight where future research can improve on current methods.

Bayesian Neural Networks: An Introduction and Survey

Abstract

Neural Networks (NNs) have provided state-of-the-art results for many challenging machine learning tasks such as detection, regression and classification across the domains of computer vision, speech recognition and natural language processing. Despite their success, they are often implemented in a frequentist scheme, meaning they are unable to reason about uncertainty in their predictions. This article introduces Bayesian Neural Networks (BNNs) and the seminal research regarding their implementation. Different approximate inference methods are compared, and used to highlight where future research can improve on current methods.

Paper Structure

This paper contains 13 sections, 52 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of neural network to traditional probabilistic methods for a regression task, with no training data in the purple region. (a) Regression output using a neural network with 2 hidden layers; (b) Regression using a Gaussian Process framework, with grey bar representing $\pm 2$ std. from expected value.
  • Figure 2: Example of a NN architecture with a single hidden layer for either binary classification or 1-D regression. Each node represents a neuron or a state where the summation and activation of input states is performed. Arrows are the parameters (weights) indicating the strength of connection between neurons.
  • Figure 3: Examples of commonly used activation functions in NNs. The output for each activation is shown in blue and the numerical derivative of each function is shown in red. These functions are (a) Sigmoid; (b) TanH; (c) ReLU; (d) Leaky-ReLU. Note the change in scale for the y-axis.
  • Figure 4: Graphical illustration of how the evidence plays a role in investigating different model hypotheses. The simple model $\mathcal{H}_1$ is able to predict a small range of data with greater strength, while the more complex model $\mathcal{H}_2$ is able to represent a larger range of data, though with lower probability. Adapted from mackay1992interpmackay1992bayesian.
  • Figure 5: Graphical illustration of how the minimisation of the KL divergence between the approximate and true posterior maximises the lower bound on the evidence. As the KL Divergence between our approximate and true posterior is minimised, the ELBO $\mathcal{F}[q_{\mathbf{\theta}}]$ tightens to the log-evidence. Therefore maximising the ELBO is equivalent to minimising the KL divergence between the approximate and true posterior. Adapted from barber1998ensemble.
  • ...and 3 more figures