Function-Space MCMC for Bayesian Wide Neural Networks
Lucia Pezzetti, Stefano Favaro, Stefano Peluchetti
TL;DR
The paper tackles uncertainty quantification in Bayesian Neural Networks by examining function-space MCMC sampling on a reparameterized weight posterior that becomes approximately Gaussian as width grows. It proves that the acceptance probabilities of the preconditioned Crank-Nicolson (pCN) and its Langevin variant (pCNL) converge to 1 in the wide-network limit, independent of stepsize, and demonstrates enhanced effective sample size and diagnostics relative to standard LMC. Empirical results on CIFAR-10 show pCN (and to a lesser extent pCNL) scales favorably with width, while LMC deteriorates, making pCN the preferred method for very wide BNNs. A marginal-conditional decomposition further reduces effective sampling dimensionality, and real-world experiments corroborate the theoretical benefits, highlighting practical impact for scalable Bayesian inference in wide neural models.
Abstract
Bayesian Neural Networks represent a fascinating confluence of deep learning and probabilistic reasoning, offering a compelling framework for understanding uncertainty in complex predictive models. In this paper, we investigate the use of the preconditioned Crank-Nicolson algorithm and its Langevin version to sample from a reparametrised posterior distribution of the neural network's weights, as the widths grow larger. In addition to being robust in the infinite-dimensional setting, we prove that the acceptance probabilities of the proposed algorithms approach 1 as the width of the network increases, independently of any stepsize tuning. Moreover, we examine and compare how the mixing speeds of the underdamped Langevin Monte Carlo, the preconditioned Crank-Nicolson and the preconditioned Crank-Nicolson Langevin samplers are influenced by changes in the network width in some real-world cases. Our findings suggest that, in wide Bayesian Neural Networks configurations, the preconditioned Crank-Nicolson algorithm allows for a scalable and more efficient sampling of the reparametrised posterior distribution, as also evidenced by a higher effective sample size and improved diagnostic results compared with the other analysed algorithms.
