Bayesian neural networks with interpretable priors from Mercer kernels
Alex Alberts, Ilias Bilionis
TL;DR
The paper addresses uncertainty quantification in neural networks by introducing Mercer priors, which place the BNN parameter distribution in the Mercer (spectral) representation of a target Gaussian-process kernel, so that outputs approximate GP draws $u_{\theta}\sim\mathcal{N}(0,S)$. This approach preserves GP interpretability while retaining the scalability of neural networks, employing stochastic gradient Langevin dynamics with unbiased estimators to draw samples from $p(\theta)$. The authors demonstrate the method through Brownian-motion and Brownian-bridge case studies and apply Mercer priors to GP regression with heteroscedastic noise, a periodic BNN, and a nonlinear PDE inverse problem, highlighting the method’s versatility. They analyze the influence of spectral truncation $K$ and network width on fidelity, discuss convergence in the infinite-width limit, and outline open theoretical questions around hyperparameters and rigorous convergence. Overall, Mercer priors offer a principled, scalable framework to inject GP-like priors into BNNs for uncertainty quantification and scientific inverse problems.
Abstract
Quantifying the uncertainty in the output of a neural network is essential for deployment in scientific or engineering applications where decisions must be made under limited or noisy data. Bayesian neural networks (BNNs) provide a framework for this purpose by constructing a Bayesian posterior distribution over the network parameters. However, the prior, which is of key importance in any Bayesian setting, is rarely meaningful for BNNs. This is because the complexity of the input-to-output map of a BNN makes it difficult to understand how certain distributions enforce any interpretable constraint on the output space. Gaussian processes (GPs), on the other hand, are often preferred in uncertainty quantification tasks due to their interpretability. The drawback is that GPs are limited to small datasets without advanced techniques, which often rely on the covariance kernel having a specific structure. To address these challenges, we introduce a new class of priors for BNNs, called Mercer priors, such that the resulting BNN has samples which approximate that of a specified GP. The method works by defining a prior directly over the network parameters from the Mercer representation of the covariance kernel, and does not rely on the network having a specific structure. In doing so, we can exploit the scalability of BNNs in a meaningful Bayesian way.
