Table of Contents
Fetching ...

Improved uncertainty quantification for neural networks with Bayesian last layer

Felix Fiedler, Sergio Lucia

TL;DR

The paper tackles uncertainty quantification in neural networks by proposing neural networks with a Bayesian last layer (BLL), a tractable compromise between full Bayesian neural networks and traditional methods. It introduces an exact, gradient-friendly reformulation of the log-marginal likelihood that includes the last-layer weights as optimization variables, enabling efficient backpropagation without inverting large matrices. A novel extrapolation-aware mechanism based on an affine-cost interpretation ties predictive uncertainty to the geometry of learned features and introduces an adaptive penalty parameter $\alpha$ to improve extrapolation quality, accompanied by a practical algorithm to tune it. The framework is extended to the multivariate setting with simplified training under reasonable assumptions, and it is compared against Bayes by Backprop and BLR with NN features through simulation, showing superior log-predictive density and controllable extrapolation behavior. Overall, the work provides a scalable, analytically tractable approach to uncertainty quantification with strong extrapolation handling and favorable performance relative to full BNNs.

Abstract

Uncertainty quantification is an important task in machine learning - a task in which standardneural networks (NNs) have traditionally not excelled. This can be a limitation for safety-critical applications, where uncertainty-aware methods like Gaussian processes or Bayesian linear regression are often preferred. Bayesian neural networks are an approach to address this limitation. They assume probability distributions for all parameters and yield distributed predictions. However, training and inference are typically intractable and approximations must be employed. A promising approximation is NNs with Bayesian last layer (BLL). They assume distributed weights only in the linear output layer and yield a normally distributed prediction. To approximate the intractable Bayesian neural network, point estimates of the distributed weights in all but the last layer should be obtained by maximizing the marginal likelihood. This has previously been challenging, as the marginal likelihood is expensive to evaluate in this setting. We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation. Furthermore, we address the challenge of uncertainty quantification for extrapolation points. We provide a metric to quantify the degree of extrapolation and derive a method to improve the uncertainty quantification for these points. Our methods are derived for the multivariate case and demonstrated in a simulation study. In comparison to Bayesian linear regression with fixed features, and a Bayesian neural network trained with variational inference, our proposed method achieves the highest log-predictive density on test data.

Improved uncertainty quantification for neural networks with Bayesian last layer

TL;DR

The paper tackles uncertainty quantification in neural networks by proposing neural networks with a Bayesian last layer (BLL), a tractable compromise between full Bayesian neural networks and traditional methods. It introduces an exact, gradient-friendly reformulation of the log-marginal likelihood that includes the last-layer weights as optimization variables, enabling efficient backpropagation without inverting large matrices. A novel extrapolation-aware mechanism based on an affine-cost interpretation ties predictive uncertainty to the geometry of learned features and introduces an adaptive penalty parameter to improve extrapolation quality, accompanied by a practical algorithm to tune it. The framework is extended to the multivariate setting with simplified training under reasonable assumptions, and it is compared against Bayes by Backprop and BLR with NN features through simulation, showing superior log-predictive density and controllable extrapolation behavior. Overall, the work provides a scalable, analytically tractable approach to uncertainty quantification with strong extrapolation handling and favorable performance relative to full BNNs.

Abstract

Uncertainty quantification is an important task in machine learning - a task in which standardneural networks (NNs) have traditionally not excelled. This can be a limitation for safety-critical applications, where uncertainty-aware methods like Gaussian processes or Bayesian linear regression are often preferred. Bayesian neural networks are an approach to address this limitation. They assume probability distributions for all parameters and yield distributed predictions. However, training and inference are typically intractable and approximations must be employed. A promising approximation is NNs with Bayesian last layer (BLL). They assume distributed weights only in the linear output layer and yield a normally distributed prediction. To approximate the intractable Bayesian neural network, point estimates of the distributed weights in all but the last layer should be obtained by maximizing the marginal likelihood. This has previously been challenging, as the marginal likelihood is expensive to evaluate in this setting. We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation. Furthermore, we address the challenge of uncertainty quantification for extrapolation points. We provide a metric to quantify the degree of extrapolation and derive a method to improve the uncertainty quantification for these points. Our methods are derived for the multivariate case and demonstrated in a simulation study. In comparison to Bayesian linear regression with fixed features, and a Bayesian neural network trained with variational inference, our proposed method achieves the highest log-predictive density on test data.
Paper Structure (17 sections, 4 theorems, 59 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 4 theorems, 59 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Assumptions ass:NN_linear_activation_function-ass:NN_last_layer_weights_prior hold. The predicted outputs are normally distributed with: where ${\bm{\Phi}} = {\bm{\phi}}({\bm{X}};{\mathbb{W}}_L)$ is the feature matrix for the training data, ${\bm{\phi}} = {\bm{\phi}}({\bm{x}};{\mathbb{W}}_L)$ is the feature matrix for the test data, and with the precision matrix

Figures (5)

  • Figure 1: Comparison of convex hull (Definition \ref{['def:Convex_hull']}), span (Definition \ref{['def:Span']}), affine hull (Definition \ref{['def:Affine_hull']}) and affine cost (Definition \ref{['def:Affine_cost']}) for two exemplary sets of features $\tilde{{\bm{\Phi}}}\in{\mathbb{R}}^{m\times n_{\tilde{\phi}}}$, both with $m=3$ and $n_{\tilde{\phi}}=2$.
  • Figure 2: NN with BLL: Predicted mean and standard deviation \ref{['eq:NN_BLL_distribution']} and feature space with $n_{\tilde{\phi}}=2$ for $m=3$ training samples. The effect of parameter $\alpha$ on the extrapolation uncertainty is shown by comparing the optimal $\alpha^*$ (maximization of LML \ref{['eq:LMLH_alpha_cost']}) with suggested improved $\alpha^{\max}$.
  • Figure 3: Effect of $\log(\alpha)$ on the LML for a trained NN and the mean log-predictive density \ref{['eq:Mean_predictive_probability']}. The same regression problem and NN as in Figure \ref{['fig:Funda_08_Pred_and_Featurespace_alpha_comparison']} is considered.
  • Figure 4: Multivariate neural network with Bayesian last layer: Predicted mean and standard deviation for two outputs with different and unknown noise level. Training with the proposed Algorithm \ref{['alg:BLL_and_alpha']}, which maximizes the log-marginal likelihood in \ref{['eq:LMLH_multivariate_simplified']}, and yields the optimal parameter $\alpha^*$. This value can then be adapted to improve the uncertainty quantification in the extrapolation regime by maximizing \ref{['eq:Mean_predictive_probability']}, yielding $\alpha^{\max}$.
  • Figure 5: Variational inference for a BNN with Bayes by Backprop. Sampled ($N=100$) predictive distribution with \ref{['eq:BNN_predictive_distribution_approx']}.

Theorems & Definitions (15)

  • Lemma 1
  • proof
  • Lemma 2: Log-marginal likelihood
  • proof
  • Theorem 1: Augmented log-marginal likelihood maximization
  • proof
  • Definition 1: Convex hull
  • Definition 2: Interpolation
  • Definition 3: Distance
  • Definition 4: Span
  • ...and 5 more