Table of Contents
Fetching ...

Bayes2IMC: In-Memory Computing for Bayesian Binary Neural Networks

Prabodh Katti, Clement Ruah, Osvaldo Simeone, Bashir M. Al-Hashimi, Bipin Rajendran

TL;DR

Bayes2IMC tackles the high resource cost of uncertainty estimation in Bayesian neural networks by embedding variationally trained binary weights into an in-memory computing crossbar that exploits PCM device stochasticity for on-the-fly synaptic sampling. It eliminates pre-neuron ADCs through a WP-NP crossbar design, uses a reparameterization to map Gaussian noise to binary weights, and employs a hardware-software co-optimized logit correction plus a drift compensation scheme to stabilize performance. The work demonstrates CIFAR-10 classification with a VGGBinaryConnect model, achieving accuracy close to ideal software and favorable calibration, while delivering up to $3.8$–$9.6\times$ total efficiency and $2.2$–$5.6\times$ power efficiency gains over SRAM baselines ($GOPS/W$, $GOPS/W/mm^2$). These results position Bayes2IMC as a competitive approach for energy-efficient, uncertainty-aware edge inference and motivate extension to other noisy NVM compute platforms.

Abstract

Bayesian Neural Networks (BNNs) provide superior estimates of uncertainty by generating an ensemble of predictive distributions. However, inference via ensembling is resource-intensive, requiring additional entropy sources to generate stochasticity which increases resource consumption. We introduce Bayes2IMC, an in-memory computing (IMC) architecture designed for binary Bayesian neural networks that leverage nanoscale device stochasticity to generate desired distributions. Our novel approach utilizes Phase-Change Memory (PCM) to harness inherent noise characteristics, enabling the creation of a binary neural network. This design eliminates the necessity for a pre-neuron Analog-to-Digital Converter (ADC), significantly improving power and area efficiency. We also develop a hardware-software co-optimized correction method applied solely on the logits in the final layer to reduce device-induced accuracy variations across deployments on hardware. Additionally, we devise a simple compensation technique that ensures no drop in classification accuracy despite conductance drift of PCM. We validate the effectiveness of our approach on the CIFAR-10 dataset with a VGGBinaryConnect model, achieving accuracy metrics comparable to ideal software implementations as well as results reported in the literature using other technologies. Finally, we present a complete core architecture and compare its projected power, performance, and area efficiency against an equivalent SRAM baseline, showing a $3.8$ to $9.6 \times$ improvement in total efficiency (in GOPS/W/mm$^2$) and a $2.2 $ to $5.6 \times$ improvement in power efficiency (in GOPS/W). In addition, the projected hardware performance of Bayes2IMC surpasses that of most of the BNN architectures based on memristive devices reported in the literature, and achieves up to $20\%$ higher power efficiency compared to the state-of-the-art.

Bayes2IMC: In-Memory Computing for Bayesian Binary Neural Networks

TL;DR

Bayes2IMC tackles the high resource cost of uncertainty estimation in Bayesian neural networks by embedding variationally trained binary weights into an in-memory computing crossbar that exploits PCM device stochasticity for on-the-fly synaptic sampling. It eliminates pre-neuron ADCs through a WP-NP crossbar design, uses a reparameterization to map Gaussian noise to binary weights, and employs a hardware-software co-optimized logit correction plus a drift compensation scheme to stabilize performance. The work demonstrates CIFAR-10 classification with a VGGBinaryConnect model, achieving accuracy close to ideal software and favorable calibration, while delivering up to total efficiency and power efficiency gains over SRAM baselines (, ). These results position Bayes2IMC as a competitive approach for energy-efficient, uncertainty-aware edge inference and motivate extension to other noisy NVM compute platforms.

Abstract

Bayesian Neural Networks (BNNs) provide superior estimates of uncertainty by generating an ensemble of predictive distributions. However, inference via ensembling is resource-intensive, requiring additional entropy sources to generate stochasticity which increases resource consumption. We introduce Bayes2IMC, an in-memory computing (IMC) architecture designed for binary Bayesian neural networks that leverage nanoscale device stochasticity to generate desired distributions. Our novel approach utilizes Phase-Change Memory (PCM) to harness inherent noise characteristics, enabling the creation of a binary neural network. This design eliminates the necessity for a pre-neuron Analog-to-Digital Converter (ADC), significantly improving power and area efficiency. We also develop a hardware-software co-optimized correction method applied solely on the logits in the final layer to reduce device-induced accuracy variations across deployments on hardware. Additionally, we devise a simple compensation technique that ensures no drop in classification accuracy despite conductance drift of PCM. We validate the effectiveness of our approach on the CIFAR-10 dataset with a VGGBinaryConnect model, achieving accuracy metrics comparable to ideal software implementations as well as results reported in the literature using other technologies. Finally, we present a complete core architecture and compare its projected power, performance, and area efficiency against an equivalent SRAM baseline, showing a to improvement in total efficiency (in GOPS/W/mm) and a to improvement in power efficiency (in GOPS/W). In addition, the projected hardware performance of Bayes2IMC surpasses that of most of the BNN architectures based on memristive devices reported in the literature, and achieves up to higher power efficiency compared to the state-of-the-art.

Paper Structure

This paper contains 21 sections, 17 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1: Top: Illustration of a Bayesian neural network (BNN) where weights take binary values upon sampling. Middle: Bayesian inference is performed by an ensemble of $N_{MC}$ predictions combined through Monte Carlo sampling to obtain predicted class, prediction confidence, and uncertainty. Bottom: Block diagram of the proposed Bayes2IMC core architecture implementing BNN inference. The crossbar array of memristive devices is divided into a weight plane (WP) and a noise plane (NP). The WP stores the parameters $z_{w_{ji}}$ obtained by reparametrizing the probability parameters $p_{w_{ji}}$, and the noise plane generates the stochasticity required for synaptic sampling. Binary weights $w_{ji}$ are then generated by comparing these variables in hardware. Unlike traditional IMC architectures, the input $x_j$, $j^{th}$ element of input vector $\mathbf{x}$, is accumulated based on the sign of $w_{ji}$.
  • Figure 2: Dataset: CIFAR10 (IND); Prediction: Correct
  • Figure 3: Dataset: CIFAR10 (IND); Prediction: Incorrect
  • Figure 4: Dataset: CIFAR100 (OOD)
  • Figure 5: Dataset: CIFAR100 (OOD)
  • ...and 12 more figures