Bayesian Inference Accelerator for Spiking Neural Networks
Prabodh Katti, Anagha Nimbekar, Chen Li, Amit Acharyya, Bashir M. Al-Hashimi, Bipin Rajendran
TL;DR
This work tackles calibrated uncertainty in edge inference by designing a hardware-friendly Bayesian Spiking Neural Network (SNN) that uses Bernoulli weights and time-based ensembles for Monte Carlo sampling. It proposes a software-hardware co-design: train a Bayesian binary ANN with full-precision Bernoulli parameters, quantize for hardware, convert to an SNN, and implement on an accelerator with PRNG reuse and 64 processing elements. The approach achieves comparable accuracy to full-precision Bayesian networks while drastically reducing spikes (up to $25\times$ fewer) and shows significant hardware efficiency on a Zynq-7000/FPGA platform with favorable GOPS/DSP and power metrics. Experiments on CIFAR-10 with a Bayesian ResNet-18 demonstrate accurate, well-calibrated predictions within as few as 4 timesteps, improving calibration (ECE) relative to frequentist counterparts. The work highlights a practical route to trustworthy, energy-efficient edge AI by combining Bayesian inference with spike-based computation and hardware-tailored quantization.
Abstract
Bayesian neural networks offer better estimates of model uncertainty compared to frequentist networks. However, inference involving Bayesian models requires multiple instantiations or sampling of the network parameters, requiring significant computational resources. Compared to traditional deep learning networks, spiking neural networks (SNNs) have the potential to reduce computational area and power, thanks to their event-driven and spike-based computational framework. Most works in literature either address frequentist SNN models or non-spiking Bayesian neural networks. In this work, we demonstrate an optimization framework for developing and implementing efficient Bayesian SNNs in hardware by additionally restricting network weights to be binary-valued to further decrease power and area consumption. We demonstrate accuracies comparable to Bayesian binary networks with full-precision Bernoulli parameters, while requiring up to $25\times$ less spikes than equivalent binary SNN implementations. We show the feasibility of the design by mapping it onto Zynq-7000, a lightweight SoC, and achieve a $6.5 \times$ improvement in GOPS/DSP while utilizing up to 30 times less power compared to the state-of-the-art.
