Analog Bayesian neural networks are insensitive to the shape of the weight distribution

Ravi G. Patel; T. Patrick Xiao; Sapan Agarwal; Christopher Bennett

Analog Bayesian neural networks are insensitive to the shape of the weight distribution

Ravi G. Patel, T. Patrick Xiao, Sapan Agarwal, Christopher Bennett

TL;DR

This work investigates training Bayesian neural networks (BNNs) with mean-field variational inference (MFVI) on analog hardware by using real device noise as the variational distribution. It shows empirically that, for fixed weight means and variances, predictive posteriors converge regardless of the noise-shape, with convergence strengthened by wider networks due to the central limit theorem. The authors develop numerical methods—maximum-likelihood noise fitting, custom quadrature, and inverse transform sampling—to enable MFVI with non-Gaussian device distributions such as Bayes-MTJ noise. The results across scalar regression, a synthetic energy-distance task, and UTKFACE demonstrate that shape differences in the variational distribution have little practical impact on predictive performance, suggesting hardware designers can focus on mean/variance control while leveraging Gaussian-trained parameters in hardware. This work supports energy-efficient BNNs on stochastic analog hardware by providing principled methods to accommodate device-noise distributions without sacrificing uncertainty quantification."

Abstract

Recent work has demonstrated that Bayesian neural networks (BNN's) trained with mean field variational inference (MFVI) can be implemented in analog hardware, promising orders of magnitude energy savings compared to the standard digital implementations. However, while Gaussians are typically used as the variational distribution in MFVI, it is difficult to precisely control the shape of the noise distributions produced by sampling analog devices. This paper introduces a method for MFVI training using real device noise as the variational distribution. Furthermore, we demonstrate empirically that the predictive distributions from BNN's with the same weight means and variances converge to the same distribution, regardless of the shape of the variational distribution. This result suggests that analog device designers do not need to consider the shape of the device noise distribution when hardware-implementing BNNs performing MFVI.

Analog Bayesian neural networks are insensitive to the shape of the weight distribution

TL;DR

Abstract

Paper Structure (16 sections, 15 equations, 4 figures)

This paper contains 16 sections, 15 equations, 4 figures.

Introduction
Predictive distributions using device weights
Mean field variational inference with device distributions
Numerical Approximations
Maximum likelihood fit to device noise
Quadrature
Inverse sampling
Assessing the impact of diverse variational distributions on neural network performance
Energy distance minimization
Scalar Regression
UTKFACE
Discussion and Conclusions
Formula for $P_2$ and $\hat{G}_2$
Further details on quadrature
Details on training and architectures
...and 1 more sections

Figures (4)

Figure 1: (Top) Comparison between device noise and distribution fit to device noise. (Bottom) The three variational distributions examined in this work.
Figure 2: Squared difference of order approximation and previous order approximation. Custom quadratures (Orange) converge faster than standard quadratures (blue) in estimating variance (Top) and KL divergence to a Gaussian (Bottom).
Figure 3: (Top) Comparison between device noise and distribution fit to device noise. (Bottom) A Gaussian variational distribution and the device distribution.
Figure 4: Comparison of predictive distributions from Gaussian and device variational distributions for energy distance problem at different widths and depths.

Analog Bayesian neural networks are insensitive to the shape of the weight distribution

TL;DR

Abstract

Analog Bayesian neural networks are insensitive to the shape of the weight distribution

Authors

TL;DR

Abstract

Table of Contents

Figures (4)