Table of Contents
Fetching ...

Bayesian Inference Accelerator for Spiking Neural Networks

Prabodh Katti, Anagha Nimbekar, Chen Li, Amit Acharyya, Bashir M. Al-Hashimi, Bipin Rajendran

TL;DR

This work tackles calibrated uncertainty in edge inference by designing a hardware-friendly Bayesian Spiking Neural Network (SNN) that uses Bernoulli weights and time-based ensembles for Monte Carlo sampling. It proposes a software-hardware co-design: train a Bayesian binary ANN with full-precision Bernoulli parameters, quantize for hardware, convert to an SNN, and implement on an accelerator with PRNG reuse and 64 processing elements. The approach achieves comparable accuracy to full-precision Bayesian networks while drastically reducing spikes (up to $25\times$ fewer) and shows significant hardware efficiency on a Zynq-7000/FPGA platform with favorable GOPS/DSP and power metrics. Experiments on CIFAR-10 with a Bayesian ResNet-18 demonstrate accurate, well-calibrated predictions within as few as 4 timesteps, improving calibration (ECE) relative to frequentist counterparts. The work highlights a practical route to trustworthy, energy-efficient edge AI by combining Bayesian inference with spike-based computation and hardware-tailored quantization.

Abstract

Bayesian neural networks offer better estimates of model uncertainty compared to frequentist networks. However, inference involving Bayesian models requires multiple instantiations or sampling of the network parameters, requiring significant computational resources. Compared to traditional deep learning networks, spiking neural networks (SNNs) have the potential to reduce computational area and power, thanks to their event-driven and spike-based computational framework. Most works in literature either address frequentist SNN models or non-spiking Bayesian neural networks. In this work, we demonstrate an optimization framework for developing and implementing efficient Bayesian SNNs in hardware by additionally restricting network weights to be binary-valued to further decrease power and area consumption. We demonstrate accuracies comparable to Bayesian binary networks with full-precision Bernoulli parameters, while requiring up to $25\times$ less spikes than equivalent binary SNN implementations. We show the feasibility of the design by mapping it onto Zynq-7000, a lightweight SoC, and achieve a $6.5 \times$ improvement in GOPS/DSP while utilizing up to 30 times less power compared to the state-of-the-art.

Bayesian Inference Accelerator for Spiking Neural Networks

TL;DR

This work tackles calibrated uncertainty in edge inference by designing a hardware-friendly Bayesian Spiking Neural Network (SNN) that uses Bernoulli weights and time-based ensembles for Monte Carlo sampling. It proposes a software-hardware co-design: train a Bayesian binary ANN with full-precision Bernoulli parameters, quantize for hardware, convert to an SNN, and implement on an accelerator with PRNG reuse and 64 processing elements. The approach achieves comparable accuracy to full-precision Bayesian networks while drastically reducing spikes (up to fewer) and shows significant hardware efficiency on a Zynq-7000/FPGA platform with favorable GOPS/DSP and power metrics. Experiments on CIFAR-10 with a Bayesian ResNet-18 demonstrate accurate, well-calibrated predictions within as few as 4 timesteps, improving calibration (ECE) relative to frequentist counterparts. The work highlights a practical route to trustworthy, energy-efficient edge AI by combining Bayesian inference with spike-based computation and hardware-tailored quantization.

Abstract

Bayesian neural networks offer better estimates of model uncertainty compared to frequentist networks. However, inference involving Bayesian models requires multiple instantiations or sampling of the network parameters, requiring significant computational resources. Compared to traditional deep learning networks, spiking neural networks (SNNs) have the potential to reduce computational area and power, thanks to their event-driven and spike-based computational framework. Most works in literature either address frequentist SNN models or non-spiking Bayesian neural networks. In this work, we demonstrate an optimization framework for developing and implementing efficient Bayesian SNNs in hardware by additionally restricting network weights to be binary-valued to further decrease power and area consumption. We demonstrate accuracies comparable to Bayesian binary networks with full-precision Bernoulli parameters, while requiring up to less spikes than equivalent binary SNN implementations. We show the feasibility of the design by mapping it onto Zynq-7000, a lightweight SoC, and achieve a improvement in GOPS/DSP while utilizing up to 30 times less power compared to the state-of-the-art.
Paper Structure (7 sections, 2 equations, 6 figures, 3 tables)

This paper contains 7 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Top: Bayesian SNN with Bernoulli distributed weights (i.e., weights take binary values upon sampling). Bottom: Bayesian inference is performed by the Monte Carlo (MC) method, where the network is sampled $n_{MC}$ times and the predictions are combined to get the final result.
  • Figure 2: Our optimization methodology to develop Bayesian SNNs for hardware implementation.
  • Figure 3: Design of the novel inference architecture for Bayesian binary SNNs.
  • Figure 4: (Left) Resource-efficient design of an LFSR-based PRNG with maximal reuse. Shown here are $k$ rows and four 8-bit pseudo-random numbers taken from each 32-bit LFSR. As our design needs 64 RNs in a single clock, $k=16$ such LFSRs are sufficient. RNs generated from the PRNG block are then utilized by the Bernoulli RN generating block (right) to generate the Bernoulli weight. A two-bit representation is used for the product of {-1,+1} weights with {0,1} spikes.
  • Figure 5: Classification performance (top) and ECE (bottom) on the CIFAR-10 dataset at various stages of optimization discussed in Fig. \ref{['fig:hwswopt']} as well as of a frequentist counterpart.
  • ...and 1 more figures