Central Limit Theorem for Bayesian Neural Network trained with Variational Inference

Arnaud Descours; Tom Huix; Arnaud Guillin; Manon Michel; Éric Moulines; Boris Nectoux

Central Limit Theorem for Bayesian Neural Network trained with Variational Inference

Arnaud Descours, Tom Huix, Arnaud Guillin, Manon Michel, Éric Moulines, Boris Nectoux

TL;DR

This work derives Central Limit Theorems for a two-layer Bayesian neural network trained by variational inference in the infinite-width regime, covering three SGD schemes: idealized SGD with exact Gaussian integrals, Bayes-by-Backprop (BbB) SGD with Monte Carlo estimates, and Minimal VI (MiVI) SGD. It proves that the centered empirical measure fluctuations converge to a Gaussian process driven by an SPDE, with the limiting covariance differing between MiVI and the BbB/Idealized schemes. The LLN groundwork from prior mean-field analyses is extended to obtain a full fluctuation theory, and numerical experiments show MiVI achieves substantial computational gains despite larger finite-width variances. These results provide a rigorous, trajectorial understanding of VI-trained BNN behavior and guide practical algorithm choice by balancing variance against computational cost.

Abstract

In this paper, we rigorously derive Central Limit Theorems (CLT) for Bayesian two-layerneural networks in the infinite-width limit and trained by variational inference on a regression task. The different networks are trained via different maximization schemes of the regularized evidence lower bound: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes-by-Backprop, and (iii) a computationally cheaper algorithm named Minimal VI. The latter was recently introduced by leveraging the information obtained at the level of the mean-field limit. Laws of large numbers are already rigorously proven for the three schemes that admits the same asymptotic limit. By deriving CLT, this work shows that the idealized and Bayes-by-Backprop schemes have similar fluctuation behavior, that is different from the Minimal VI one. Numerical experiments then illustrate that the Minimal VI scheme is still more efficient, in spite of bigger variances, thanks to its important gain in computational complexity.

Central Limit Theorem for Bayesian Neural Network trained with Variational Inference

TL;DR

Abstract

Paper Structure (29 sections, 22 theorems, 190 equations, 2 figures)

This paper contains 29 sections, 22 theorems, 190 equations, 2 figures.

Introduction
Related works.
Setting and proven mean-field limit
Variational Inference and Evidence Lower Bound
The Evidence Lower Bound
Loss function and prior distribution
Stochastic Gradient Descent and maximization algorithms
Idealized SGD
Bayes-by-Backprop (BbB) SGD
Minimal VI (MiVI) SGD
Mean-field limit and Law of Large Numbers
Empirical distributions and assumptions
Law of Large Numbers for the sequence of rescaled empirical distribution
Main results: Central Limit Theorems
Numerical simulations
...and 14 more sections

Key Result

Theorem 1

Let $\gamma_0> 1+ \frac{d+1}{2}$. Assume A. Let the $\{\theta^i_k, k\ge 0, i\in \{1,\ldots,N\}\}$'s be generated either by the algorithm eq.algo-ideal, eq.algo-batch, or eq.algo-z1z2. Then, $(\mu^N)_{N\ge1}$ (see empirical_distrib) converges in $\mathbf P$-probability in $\mathcal{D}(\mathbf R_+,\ma

Figures (2)

Figure 1: Convergence of $\mathbb{V}[\langle f, \eta_t^N \rangle]$ in the simple (left column) and complex (right column) setting, for $f_{mean}$ ($1^{st}$ line), $f_{std}$ ($2^{nd}$ line) and $f_{pred}$ ($3^{rd}$ line).
Figure 2: $\mathbb{V}[\langle f, \mu_t^N \rangle]$ with respect to $N$, in the simple (left column) and complex (right column) setting, for $f_{mean}$ ($1^{st}$ line), $f_{std}$ ($2^{nd}$ line) and $f_{pred}$ ($3^{rd}$ line).

Theorems & Definitions (43)

Remark 1
Theorem 1: colt
Definition 1
Definition 2
Theorem 2
Lemma 1
Proposition 1
Lemma 2
proof
Lemma 3
...and 33 more

Central Limit Theorem for Bayesian Neural Network trained with Variational Inference

TL;DR

Abstract

Central Limit Theorem for Bayesian Neural Network trained with Variational Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (43)