A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation

Zephan M. Enciso; Boyang Cheng; Likai Pei; Jianbo Liu; Steven Davis; Michael Niemier; Ningyuan Cao

A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation

Zephan M. Enciso, Boyang Cheng, Likai Pei, Jianbo Liu, Steven Davis, Michael Niemier, Ningyuan Cao

TL;DR

This paper tackles the challenge of uncertainty estimation in safety-critical AI by presenting a 65 nm ASIC that tightly integrates a 360 fJ/Sample in-word Gaussian RNG into SRAM to enable fully-parallel compute-in-memory for Bayesian neural networks. The approach combines partial-BNN weight structuring with a weight-decomposition scheme $w_{(i,j)} = \mu_{(i,j)} + \sigma_{(i,j)}\epsilon$, where $\epsilon \sim \mathcal{N}(0,1)$, and two CIM subarrays to compute $\mathbf{X}\mu$ and $\mathbf{X}(\sigma\epsilon)$ in parallel. A novel in-word GRNG based on capacitor thermal noise generates Gaussian samples directly inside memory, while calibration mitigates static variation errors. Experimental results on a fabricated 65 nm chip show 5.12 GSa/s RNG throughput and 102 GOp/s NN throughput in a 0.45 mm^2 die, with RNG output exhibiting near-Gaussian normality ($r$ ≈ 0.997) and favorable energy-delay characteristics. The work improves uncertainty estimation metrics (lower ECE, higher predictive entropy for errors) and demonstrates a practical path for robust, energy-efficient edge AI with embedded stochastic sampling.

Abstract

Uncertainty estimation is an indispensable capability for AI-enabled, safety-critical applications, e.g. autonomous vehicles or medical diagnosis. Bayesian neural networks (BNNs) use Bayesian statistics to provide both classification predictions and uncertainty estimation, but they suffer from high computational overhead associated with random number generation and repeated sample iterations. Furthermore, BNNs are not immediately amenable to acceleration through compute-in-memory architectures due to the frequent memory writes necessary after each RNG operation. To address these challenges, we present an ASIC that integrates 360 fJ/Sample Gaussian RNG directly into the SRAM memory words. This integration reduces RNG overhead and enables fully-parallel compute-in-memory operations for BNNs. The prototype chip achieves 5.12 GSa/s RNG throughput and 102 GOp/s neural network throughput while occupying 0.45 mm2, bringing AI uncertainty estimation to edge computation.

A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation

TL;DR

, where

, and two CIM subarrays to compute

and

in parallel. A novel in-word GRNG based on capacitor thermal noise generates Gaussian samples directly inside memory, while calibration mitigates static variation errors. Experimental results on a fabricated 65 nm chip show 5.12 GSa/s RNG throughput and 102 GOp/s NN throughput in a 0.45 mm^2 die, with RNG output exhibiting near-Gaussian normality (

≈ 0.997) and favorable energy-delay characteristics. The work improves uncertainty estimation metrics (lower ECE, higher predictive entropy for errors) and demonstrates a practical path for robust, energy-efficient edge AI with embedded stochastic sampling.

Abstract

Paper Structure (17 sections, 8 equations, 12 figures, 2 tables)

This paper contains 17 sections, 8 equations, 12 figures, 2 tables.

Introduction
Background
Bayesian Neural Networks
Compute-in-Memory Accelerators
BNN Hardware Acceleration
Chip Architecture and Circuit Design
Hardware-Software Co-Design
CIM Tile Architecture
In-Word GRNG Circuit
Capacitor Thermal Noise
GRNG Operation
Calibration for Static Variation
CIM Memory Words
Hardware Evaluation
In-Word GRNG
...and 2 more sections

Figures (12)

Figure 1: The role of uncertainty estimation in safety-critical applications. During typical operation, the model rapidly performs autonomous actions. However, if the model's certainty falls below a predetermined threshold, it would request human intervention to avoid catastrophic effects.
Figure 2: Left) Conventional BNN neuron. Each weight uses GRNG to sample from a Guassian distribution, so the weight must store distribution properties $\mu$ and $\sigma$. Right) BNN fully-connected (FC) layers incur significant overhead from multiple memory operations and GRNG compared to standard FC layers.
Figure 3: CIM tile architecture, featuring two subarrays for separately computing $\mathbf{X}\sigma\epsilon$ and $\mathbf{X}\mu$. Both subarrays receive the same input $\mathbf{X}$, and downstream reduction logic recombines the results.
Figure 4: GRNG circuit and timing diagram. Thermal noise causes $C_n$ to discharge at a different rate than $C_p$, producing an ouput pulse E whose duration follows a 0--mean Gaussian distribution.
Figure 5: $\sigma\epsilon$ and $\mu$ CIM word circuits. The $\sigma\epsilon$ word contains additional switches to interface with the GRNG and produce a differential output. The $\mu$ word's output is differential because the data is stored differentially across 2 SRAM cells per bit.
...and 7 more figures

A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation

TL;DR

Abstract

A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)