From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference

Gracielle Antunes de Araújo; Flávio B. Gonçalves

From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference

Gracielle Antunes de Araújo, Flávio B. Gonçalves

TL;DR

A new covariance function is proposed as a convex mixture of components induced by four widely used activation functions, and key properties including positive definiteness and both strict and practical identifiability under different input designs are characterized.

Abstract

In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence result from BNNs to GPs by relaxing assumptions used in prior formulations, and we compare alternative parameterizations of the limiting GP model. Building on this theory, we propose a new covariance function defined as a convex mixture of components induced by four widely used activation functions, and we characterize key properties including positive definiteness and both strict and practical identifiability under different input designs. For computation, we develop a scalable maximum a posterior (MAP) training and prediction procedure using a Nyström approximation, and we show how the Nyström rank and anchor selection control the cost-accuracy trade-off. Experiments on controlled simulations and real-world tabular datasets demonstrate stable hyperparameter estimates and competitive predictive performance at realistic computational cost.

From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference

TL;DR

Abstract

Paper Structure (54 sections, 4 theorems, 83 equations, 3 figures, 12 tables)

This paper contains 54 sections, 4 theorems, 83 equations, 3 figures, 12 tables.

Introduction
Background and Related Work
Infinite-width limit for shallow BNN
Model: shallow BNN with one hidden layer
Limiting kernel and interpretation
Minimal regularity condition
General convergence theorem
Proof sketch.
Mixed kernel derived from a BNN
Kernels induced by nonlinear activations in BNNs
tanh.
Sigmoid.
ReLU.
LeakyReLU.
Mixed kernel from the GP limit of a BNN
...and 39 more sections

Key Result

Theorem 2.1

Consider the model eq:shallow_bnn with the prior independent, centered parameters with finite variances and the scaling $\mathbb{V}(v_{kj})=\sigma_v^2/H$. Assume $h$ and the priors are such that $\mathrm{Var}(f_k(\mathbf{x}))<\infty$ for all $\mathbf{x}$ (e.g., $\mathbb{E}[h(Z)^2]<\infty$). Then, fo with kernel given by eq:kernel_general_main, namely where $(Z,Z')$ is the pre-activation pair asso

Figures (3)

Figure 1: Shallow feedforward network (one-hidden-layer MLP): $I$ inputs, $H$ hidden units, and $K$ outputs. The figure illustrates the regression case, where $g(\cdot)\equiv \mathrm{Id}(\cdot)$.
Figure 2: The column $\mathbf x^{(p)}\in\mathbb R^{I}$ is propagated through the network, producing the output vector $\mathbf f(\mathbf x^{(p)})\in\mathbb R^{K}$. Repeating this for $p=1,\dots,n$ and stacking the outputs as columns yields $\mathbf F=[\mathbf f(\mathbf x^{(1)})\ \cdots\ \mathbf f(\mathbf x^{(n)})]\in\mathbb R^{K\times n}$.
Figure 3: Limit of a wide one-hidden-layer network to a GP for output $k$. As $H\to\infty$, the vector $\bigl(f_k^{(1)},\dots,f_k^{(n)}\bigr)^\top$ converges in distribution to a multivariate normal with covariance $K(\mathbf{X},\mathbf{X})$, and the collection $\{f_k(\mathbf{x})\}_{\mathbf{x}}$ defines a GP $\mathcal{GP}(0,K)$.

Theorems & Definitions (6)

Theorem 2.1: Convergence of a shallow BNN to a GP
Proposition 3.1: Convergence with mixed activations (additive blocks)
Proposition 3.2: Identifiability of the mixed kernel
Proposition A.1: Polynomial growth implies a finite second moment
proof
proof

From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference

TL;DR

Abstract

From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (6)