Analog Bayesian neural networks are insensitive to the shape of the weight distribution
Ravi G. Patel, T. Patrick Xiao, Sapan Agarwal, Christopher Bennett
TL;DR
This work investigates training Bayesian neural networks (BNNs) with mean-field variational inference (MFVI) on analog hardware by using real device noise as the variational distribution. It shows empirically that, for fixed weight means and variances, predictive posteriors converge regardless of the noise-shape, with convergence strengthened by wider networks due to the central limit theorem. The authors develop numerical methods—maximum-likelihood noise fitting, custom quadrature, and inverse transform sampling—to enable MFVI with non-Gaussian device distributions such as Bayes-MTJ noise. The results across scalar regression, a synthetic energy-distance task, and UTKFACE demonstrate that shape differences in the variational distribution have little practical impact on predictive performance, suggesting hardware designers can focus on mean/variance control while leveraging Gaussian-trained parameters in hardware. This work supports energy-efficient BNNs on stochastic analog hardware by providing principled methods to accommodate device-noise distributions without sacrificing uncertainty quantification."
Abstract
Recent work has demonstrated that Bayesian neural networks (BNN's) trained with mean field variational inference (MFVI) can be implemented in analog hardware, promising orders of magnitude energy savings compared to the standard digital implementations. However, while Gaussians are typically used as the variational distribution in MFVI, it is difficult to precisely control the shape of the noise distributions produced by sampling analog devices. This paper introduces a method for MFVI training using real device noise as the variational distribution. Furthermore, we demonstrate empirically that the predictive distributions from BNN's with the same weight means and variances converge to the same distribution, regardless of the shape of the variational distribution. This result suggests that analog device designers do not need to consider the shape of the device noise distribution when hardware-implementing BNNs performing MFVI.
