Posterior concentrations of fully-connected Bayesian neural networks with general priors on the weights
Insung Kong, Yongdai Kim
TL;DR
This paper addresses the theoretical gap for Bayesian neural networks with i.i.d. Gaussian priors on weights by introducing a bounded-parameter, non-sparse DNN approximation theory using Leaky-ReLU activations. It proves near-minimax posterior concentration for BNNs when the true function lies in the Hölder class $\mathcal{H}_d^\beta(K)$, achieving the rate $\varepsilon_n = n^{-\beta/(2\beta+d)} \log^\gamma(n)$ with $\gamma>2$, under mild priors. The key technical advance is a bounded-parameter approximation theorem (Theorem 1) that enables Gaussian and other general priors to attain optimal-like concentration, and it is extended to nonparametric Gaussian and logistic regression, adaptive smoothness via random-width priors (Theorem 4), and hierarchical composition structures that can mitigate the curse of dimensionality (Theorem 5). Overall, the work broadens the practical applicability of BNNs by closing the gap between theory and common priors, while offering pathways to adaptivity and structured-function modeling.
Abstract
Bayesian approaches for training deep neural networks (BNNs) have received significant interest and have been effectively utilized in a wide range of applications. There have been several studies on the properties of posterior concentrations of BNNs. However, most of these studies only demonstrate results in BNN models with sparse or heavy-tailed priors. Surprisingly, no theoretical results currently exist for BNNs using Gaussian priors, which are the most commonly used one. The lack of theory arises from the absence of approximation results of Deep Neural Networks (DNNs) that are non-sparse and have bounded parameters. In this paper, we present a new approximation theory for non-sparse DNNs with bounded parameters. Additionally, based on the approximation theory, we show that BNNs with non-sparse general priors can achieve near-minimax optimal posterior concentration rates to the true model.
