Table of Contents
Fetching ...

Multivariate Bayesian Last Layer for Regression with Uncertainty Quantification and Decomposition

Han Wang, Eiji Kawasaki, Guillaume Damblin, Geoffrey Daniel

TL;DR

This work develops Multivariate Bayesian Last Layer (MBLL) models for regression with heteroscedastic noise, providing closed-form posterior predictive expressions and explicit decomposition of uncertainty into aleatoric and epistemic components. By decoupling feature learning from Bayesian last-layer inference, MBLL enables single-pass uncertainty quantification and extends to matrix-variate and matrix-T regimes to handle unknown noise covariance. The authors introduce an evidential framework for hyperparameter learning and an EM algorithm that stabilizes training, supports transfer learning, and yields principled uncertainty estimates. They also analyze the theoretical properties of the framework, including degeneracy in the unregularized evidence objective and conditions under which it is avoided with regularization, and demonstrate practical performance through synthetic and real-data experiments, including transfer-learning scenarios and time-series forecasting. Overall, MBLL provides a scalable, uncertainty-aware extension to deep networks with principled uncertainty decomposition, offering a pathway to robust deployment in multivariate and heteroscedastic settings.

Abstract

We present new Bayesian Last Layer neural network models in the setting of multivariate regression under heteroscedastic noise, and propose EM algorithms for parameter learning. Bayesian modeling of a neural network's final layer has the attractive property of uncertainty quantification with a single forward pass. The proposed framework is capable of disentangling the aleatoric and epistemic uncertainty, and can be used to enhance a canonically trained deep neural network with uncertainty-aware capabilities.

Multivariate Bayesian Last Layer for Regression with Uncertainty Quantification and Decomposition

TL;DR

This work develops Multivariate Bayesian Last Layer (MBLL) models for regression with heteroscedastic noise, providing closed-form posterior predictive expressions and explicit decomposition of uncertainty into aleatoric and epistemic components. By decoupling feature learning from Bayesian last-layer inference, MBLL enables single-pass uncertainty quantification and extends to matrix-variate and matrix-T regimes to handle unknown noise covariance. The authors introduce an evidential framework for hyperparameter learning and an EM algorithm that stabilizes training, supports transfer learning, and yields principled uncertainty estimates. They also analyze the theoretical properties of the framework, including degeneracy in the unregularized evidence objective and conditions under which it is avoided with regularization, and demonstrate practical performance through synthetic and real-data experiments, including transfer-learning scenarios and time-series forecasting. Overall, MBLL provides a scalable, uncertainty-aware extension to deep networks with principled uncertainty decomposition, offering a pathway to robust deployment in multivariate and heteroscedastic settings.

Abstract

We present new Bayesian Last Layer neural network models in the setting of multivariate regression under heteroscedastic noise, and propose EM algorithms for parameter learning. Bayesian modeling of a neural network's final layer has the attractive property of uncertainty quantification with a single forward pass. The proposed framework is capable of disentangling the aleatoric and epistemic uncertainty, and can be used to enhance a canonically trained deep neural network with uncertainty-aware capabilities.
Paper Structure (39 sections, 17 theorems, 65 equations, 13 figures, 3 tables, 3 algorithms)

This paper contains 39 sections, 17 theorems, 65 equations, 13 figures, 3 tables, 3 algorithms.

Key Result

Theorem 2.1

Suppose that the noise covariance $V$ is known. With the symbols defined in Table tab:symbols, it holds the following distributions:

Figures (13)

  • Figure 1: Components of log-evidence $\ln\left| \Omega \right|$ and $\mathop{\mathrm{trace}}\nolimits\left( \Omega^{-1} E^\top V^{-1} E \right)$ as a function of $k$ for the example in Figure \ref{['fig:em_fixedbasis']}. Abscissa of pannels in the second row are in logarithmic scale.
  • Figure 2: EM algorithm \ref{['alg:ELBO_EM_framework']} for Bayesian interpolation with DNN as basis function. We hold $M=0$ fixed and restrict $K$ to be isotropic. Color shades correspond to 1, 2, and 3 times of standard deviation.
  • Figure 3: Convergence of EM for the experiment in Figure \ref{['fig:em_dnn']}. The algorithm terminates in less than 30 steps, with the relative changes below $10^{-3}$.
  • Figure 4: Transfer learning approach for the experiment of Figure \ref{['fig:em_dnn']}. Algorithm \ref{['alg:ELBO_EM_framework']} is applied to adapt the BLL model with a fixed pretrained probability density network.
  • Figure 5: Predictions of Beijing air quality using a nonlinear (1,2)-VARX BLL model, with $\nu=15$ and an isotropic $K$ and $\Sigma$. Negative values are due to data standardization.
  • ...and 8 more figures

Theorems & Definitions (20)

  • Theorem 2.1: Bayesian regression
  • Remark 2.1
  • Proposition 2.2
  • Proposition 2.3: Decomposition of uncertainties
  • Proposition 2.4
  • Theorem 2.5: Joint minimization
  • Remark 2.2
  • Remark 3.1
  • Theorem 4.1: Bayesian regression: $V$ unknown
  • Proposition 4.2: Decomposition of uncertainties: $V$ unknown
  • ...and 10 more