Table of Contents
Fetching ...

Minimaxity and Admissibility of Bayesian Neural Networks

Daniel Andrew Coulson, Martin T. Wells

Abstract

Bayesian neural networks (BNNs) offer a natural probabilistic formulation for inference in deep learning models. Despite their popularity, their optimality has received limited attention through the lens of statistical decision theory. In this paper, we study decision rules induced by deep, fully connected feedforward ReLU BNNs in the normal location model under quadratic loss. We show that, for fixed prior scales, the induced Bayes decision rule is not minimax. We then propose a hyperprior on the effective output variance of the BNN prior that yields a superharmonic square-root marginal density, establishing that the resulting decision rule is simultaneously admissible and minimax. We further extend these results from the quadratic loss setting to the predictive density estimation problem with Kullback--Leibler loss. Finally, we validate our theoretical findings numerically through simulation.

Minimaxity and Admissibility of Bayesian Neural Networks

Abstract

Bayesian neural networks (BNNs) offer a natural probabilistic formulation for inference in deep learning models. Despite their popularity, their optimality has received limited attention through the lens of statistical decision theory. In this paper, we study decision rules induced by deep, fully connected feedforward ReLU BNNs in the normal location model under quadratic loss. We show that, for fixed prior scales, the induced Bayes decision rule is not minimax. We then propose a hyperprior on the effective output variance of the BNN prior that yields a superharmonic square-root marginal density, establishing that the resulting decision rule is simultaneously admissible and minimax. We further extend these results from the quadratic loss setting to the predictive density estimation problem with Kullback--Leibler loss. Finally, we validate our theoretical findings numerically through simulation.

Paper Structure

This paper contains 29 sections, 11 theorems, 601 equations, 6 figures.

Key Result

Lemma 1.1

The prior density of a depth $d$ Bayesian ReLU neural network given in (bnnprior) can represented as where $g(v) = \sum\limits_{k} w_{k} g_{k}(v), w_{k} = \frac{1}{2^{n_{1}+\dots+n_{d-1}}} \prod\limits_{\ell = 1}^{d-1} \binom{n_{\ell}}{k_{\ell}}, k_{\ell} \in \{ 1, \dots, n_{\ell} \}$ and $g_{k}(v)$ is the density function of the random variable $V_{\boldsymbol{k}} = 2^{d-1} ||\boldsymbol{x}||^{2

Figures (6)

  • Figure 1: Estimated risk for several decision rules in dimension $p=5$ as a function of $||\boldsymbol{\theta}||$. The plotted rules are the MLE, the fixed-scale BNN rule, the Beta-prime minimax shrinkage rule, and the dropout-BNN rule. For the BNN-based rules, the network depth is $d=3$, the hidden layer widths are $n_1=n_2=20$, and the layer scales are $\sigma_1=\sigma_2=\sigma_3=1$; for the dropout-BNN rule, the keep probabilities are $q_1=q_2=0.8$ with inverted dropout.
  • Figure 2: Estimated risk for several decision rules in dimension $p=50$ as a function of $||\boldsymbol{\theta}||$. The plotted rules are the MLE, the fixed-scale BNN rule, the Beta-prime minimax shrinkage rule, and the dropout-BNN rule. For the BNN-based rules, the network depth is $d=3$, the hidden layer widths are $n_1=n_2=20$, and the layer scales are $\sigma_1=\sigma_2=\sigma_3=1$; for the dropout-BNN rule, the keep probabilities are $q_1=q_2=0.8$ with inverted dropout.
  • Figure 3: Estimated risk for several decision rules in dimension $p=100$ as a function of $||\boldsymbol{\theta}||$. The plotted rules are the MLE, the fixed-scale BNN rule, the Beta-prime minimax shrinkage rule, and the dropout-BNN rule. For the BNN-based rules, the network depth is $d=3$, the hidden layer widths are $n_1=n_2=20$, and the layer scales are $\sigma_1=\sigma_2=\sigma_3=1$; for the dropout-BNN rule, the keep probabilities are $q_1=q_2=0.8$ with inverted dropout.
  • Figure 4: Estimated risk for several decision rules in dimension $p=5$ as a function of $\|\boldsymbol{\theta}\|$ under several sparsity regimes. The plotted rules are the MLE, the Beta-prime minimax shrinkage rule, and the horseshoe posterior mean. The true sparsity levels considered are $1$, $2$, and $5$.
  • Figure 5: Estimated risk for several decision rules in dimension $p=50$ as a function of $\|\boldsymbol{\theta}\|$ under several sparsity regimes. The plotted rules are the MLE, the Beta-prime minimax shrinkage rule, and the horseshoe posterior mean. The true sparsity levels considered are $1$, $2$, $5$, $10$, $25$, and $50$.
  • ...and 1 more figures

Theorems & Definitions (22)

  • Lemma 1.1
  • Lemma 2.1
  • Lemma 2.2
  • Lemma 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Theorem 2.6
  • Theorem 3.1
  • Corollary 3.2
  • Theorem 4.1
  • ...and 12 more