Posterior Inference on Shallow Infinitely Wide Bayesian Neural Networks under Weights with Unbounded Variance
Jorge Loría, Anindya Bhadra
TL;DR
This work develops posterior inference for shallow Bayesian neural networks in the infinite-width limit under unbounded weight variance, where the limit is an $\alpha$-stable process. By representing the stable density as a normal scale mixture, the authors obtain a conditionally Gaussian form for the data, enabling tractable posterior inference with Gaussian machinery. They derive an explicit posterior predictive density, propose an MCMC sampler, and validate the method on one- and two-dimensional synthetic problems as well as a real Taipei real estate dataset, demonstrating improved performance for functions with jumps and sharp discontinuities. The approach yields non-degenerate posteriors for the latent kernel quantity and provides a practical, flexible framework for uncertainty quantification in non-Gaussian neural-network limits, with potential extensions to deeper architectures and CNNs.
Abstract
From the classical and influential works of Neal (1996), it is known that the infinite width scaling limit of a Bayesian neural network with one hidden layer is a Gaussian process, when the network weights have bounded prior variance. Neal's result has been extended to networks with multiple hidden layers and to convolutional neural networks, also with Gaussian process scaling limits. The tractable properties of Gaussian processes then allow straightforward posterior inference and uncertainty quantification, considerably simplifying the study of the limit process compared to a network of finite width. Neural network weights with unbounded variance, however, pose unique challenges. In this case, the classical central limit theorem breaks down and it is well known that the scaling limit is an $α$-stable process under suitable conditions. However, current literature is primarily limited to forward simulations under these processes and the problem of posterior inference under such a scaling limit remains largely unaddressed, unlike in the Gaussian process case. To this end, our contribution is an interpretable and computationally efficient procedure for posterior inference, using a conditionally Gaussian representation, that then allows full use of the Gaussian process machinery for tractable posterior inference and uncertainty quantification in the non-Gaussian regime.
