Table of Contents
Fetching ...

Posterior Inference on Shallow Infinitely Wide Bayesian Neural Networks under Weights with Unbounded Variance

Jorge Loría, Anindya Bhadra

TL;DR

This work develops posterior inference for shallow Bayesian neural networks in the infinite-width limit under unbounded weight variance, where the limit is an $\alpha$-stable process. By representing the stable density as a normal scale mixture, the authors obtain a conditionally Gaussian form for the data, enabling tractable posterior inference with Gaussian machinery. They derive an explicit posterior predictive density, propose an MCMC sampler, and validate the method on one- and two-dimensional synthetic problems as well as a real Taipei real estate dataset, demonstrating improved performance for functions with jumps and sharp discontinuities. The approach yields non-degenerate posteriors for the latent kernel quantity and provides a practical, flexible framework for uncertainty quantification in non-Gaussian neural-network limits, with potential extensions to deeper architectures and CNNs.

Abstract

From the classical and influential works of Neal (1996), it is known that the infinite width scaling limit of a Bayesian neural network with one hidden layer is a Gaussian process, when the network weights have bounded prior variance. Neal's result has been extended to networks with multiple hidden layers and to convolutional neural networks, also with Gaussian process scaling limits. The tractable properties of Gaussian processes then allow straightforward posterior inference and uncertainty quantification, considerably simplifying the study of the limit process compared to a network of finite width. Neural network weights with unbounded variance, however, pose unique challenges. In this case, the classical central limit theorem breaks down and it is well known that the scaling limit is an $α$-stable process under suitable conditions. However, current literature is primarily limited to forward simulations under these processes and the problem of posterior inference under such a scaling limit remains largely unaddressed, unlike in the Gaussian process case. To this end, our contribution is an interpretable and computationally efficient procedure for posterior inference, using a conditionally Gaussian representation, that then allows full use of the Gaussian process machinery for tractable posterior inference and uncertainty quantification in the non-Gaussian regime.

Posterior Inference on Shallow Infinitely Wide Bayesian Neural Networks under Weights with Unbounded Variance

TL;DR

This work develops posterior inference for shallow Bayesian neural networks in the infinite-width limit under unbounded weight variance, where the limit is an -stable process. By representing the stable density as a normal scale mixture, the authors obtain a conditionally Gaussian form for the data, enabling tractable posterior inference with Gaussian machinery. They derive an explicit posterior predictive density, propose an MCMC sampler, and validate the method on one- and two-dimensional synthetic problems as well as a real Taipei real estate dataset, demonstrating improved performance for functions with jumps and sharp discontinuities. The approach yields non-degenerate posteriors for the latent kernel quantity and provides a practical, flexible framework for uncertainty quantification in non-Gaussian neural-network limits, with potential extensions to deeper architectures and CNNs.

Abstract

From the classical and influential works of Neal (1996), it is known that the infinite width scaling limit of a Bayesian neural network with one hidden layer is a Gaussian process, when the network weights have bounded prior variance. Neal's result has been extended to networks with multiple hidden layers and to convolutional neural networks, also with Gaussian process scaling limits. The tractable properties of Gaussian processes then allow straightforward posterior inference and uncertainty quantification, considerably simplifying the study of the limit process compared to a network of finite width. Neural network weights with unbounded variance, however, pose unique challenges. In this case, the classical central limit theorem breaks down and it is well known that the scaling limit is an -stable process under suitable conditions. However, current literature is primarily limited to forward simulations under these processes and the problem of posterior inference under such a scaling limit remains largely unaddressed, unlike in the Gaussian process case. To this end, our contribution is an interpretable and computationally efficient procedure for posterior inference, using a conditionally Gaussian representation, that then allows full use of the Gaussian process machinery for tractable posterior inference and uncertainty quantification in the non-Gaussian regime.
Paper Structure (31 sections, 6 theorems, 18 equations, 15 figures, 4 tables, 3 algorithms)

This paper contains 31 sections, 6 theorems, 18 equations, 15 figures, 4 tables, 3 algorithms.

Key Result

Proposition 1

DerLee2005. Let the network specified by Equations eq:nn1 and eq:nn2, with a single hidden layer ($K=2$), have i.i.d. hidden-to-output weights $w_{j}^{(2)}$ distributed as a symmetric $\alpha$-stable with scale parameter $(\nu/2)^{1/2}{p}_{2}^{-1/\alpha}$. Then $y(\mathbf{x})$ converges in distribut where angle brackets denote the inner product, $\mathbf{t} = (t_1,\ldots,t_n)$ is the argument of t

Figures (15)

  • Figure 1: Left: Boxplots of mean absolute error of out-of-sample prediction over test points, and Right: predicted values over 100 points on a regular grid on $[-2,2]$. Training points in black dots.
  • Figure 2: The point-wise $90\%$ posterior predictive intervals for GP Bayes and Stable over 100 points on a regular grid on $[-2,2]$, training points in black.
  • Figure 3: Left: Boxplots of mean absolute error (MAE) of out-of-sample prediction over test points, and Right: predicted values over a $9\times 9$ grid on $[-1,1]^2$.
  • Figure 4: Posterior predictive quantiles at the $5\%,50\%$, and $95\%$ levels for GP Bayes (upper) and Stable (lower) over a $9\times 9$ grid on $[-1,1]^2$.
  • Figure 5: Posterior predictive quantiles at the 5%, 50%, and 95% levels for GP Bayes (upper) and Stable (lower) on validation.
  • ...and 10 more figures

Theorems & Definitions (9)

  • Proposition 1
  • Theorem 1
  • Corollary 1
  • proof
  • Corollary 2
  • proof
  • Proposition 2
  • proof
  • Theorem S.1