Table of Contents
Fetching ...

A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation

Zephan M. Enciso, Boyang Cheng, Likai Pei, Jianbo Liu, Steven Davis, Michael Niemier, Ningyuan Cao

TL;DR

This paper tackles the challenge of uncertainty estimation in safety-critical AI by presenting a 65 nm ASIC that tightly integrates a 360 fJ/Sample in-word Gaussian RNG into SRAM to enable fully-parallel compute-in-memory for Bayesian neural networks. The approach combines partial-BNN weight structuring with a weight-decomposition scheme $w_{(i,j)} = \mu_{(i,j)} + \sigma_{(i,j)}\epsilon$, where $\epsilon \sim \mathcal{N}(0,1)$, and two CIM subarrays to compute $\mathbf{X}\mu$ and $\mathbf{X}(\sigma\epsilon)$ in parallel. A novel in-word GRNG based on capacitor thermal noise generates Gaussian samples directly inside memory, while calibration mitigates static variation errors. Experimental results on a fabricated 65 nm chip show 5.12 GSa/s RNG throughput and 102 GOp/s NN throughput in a 0.45 mm^2 die, with RNG output exhibiting near-Gaussian normality ($r$ ≈ 0.997) and favorable energy-delay characteristics. The work improves uncertainty estimation metrics (lower ECE, higher predictive entropy for errors) and demonstrates a practical path for robust, energy-efficient edge AI with embedded stochastic sampling.

Abstract

Uncertainty estimation is an indispensable capability for AI-enabled, safety-critical applications, e.g. autonomous vehicles or medical diagnosis. Bayesian neural networks (BNNs) use Bayesian statistics to provide both classification predictions and uncertainty estimation, but they suffer from high computational overhead associated with random number generation and repeated sample iterations. Furthermore, BNNs are not immediately amenable to acceleration through compute-in-memory architectures due to the frequent memory writes necessary after each RNG operation. To address these challenges, we present an ASIC that integrates 360 fJ/Sample Gaussian RNG directly into the SRAM memory words. This integration reduces RNG overhead and enables fully-parallel compute-in-memory operations for BNNs. The prototype chip achieves 5.12 GSa/s RNG throughput and 102 GOp/s neural network throughput while occupying 0.45 mm2, bringing AI uncertainty estimation to edge computation.

A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation

TL;DR

This paper tackles the challenge of uncertainty estimation in safety-critical AI by presenting a 65 nm ASIC that tightly integrates a 360 fJ/Sample in-word Gaussian RNG into SRAM to enable fully-parallel compute-in-memory for Bayesian neural networks. The approach combines partial-BNN weight structuring with a weight-decomposition scheme , where , and two CIM subarrays to compute and in parallel. A novel in-word GRNG based on capacitor thermal noise generates Gaussian samples directly inside memory, while calibration mitigates static variation errors. Experimental results on a fabricated 65 nm chip show 5.12 GSa/s RNG throughput and 102 GOp/s NN throughput in a 0.45 mm^2 die, with RNG output exhibiting near-Gaussian normality ( ≈ 0.997) and favorable energy-delay characteristics. The work improves uncertainty estimation metrics (lower ECE, higher predictive entropy for errors) and demonstrates a practical path for robust, energy-efficient edge AI with embedded stochastic sampling.

Abstract

Uncertainty estimation is an indispensable capability for AI-enabled, safety-critical applications, e.g. autonomous vehicles or medical diagnosis. Bayesian neural networks (BNNs) use Bayesian statistics to provide both classification predictions and uncertainty estimation, but they suffer from high computational overhead associated with random number generation and repeated sample iterations. Furthermore, BNNs are not immediately amenable to acceleration through compute-in-memory architectures due to the frequent memory writes necessary after each RNG operation. To address these challenges, we present an ASIC that integrates 360 fJ/Sample Gaussian RNG directly into the SRAM memory words. This integration reduces RNG overhead and enables fully-parallel compute-in-memory operations for BNNs. The prototype chip achieves 5.12 GSa/s RNG throughput and 102 GOp/s neural network throughput while occupying 0.45 mm2, bringing AI uncertainty estimation to edge computation.
Paper Structure (17 sections, 8 equations, 12 figures, 2 tables)

This paper contains 17 sections, 8 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: The role of uncertainty estimation in safety-critical applications. During typical operation, the model rapidly performs autonomous actions. However, if the model's certainty falls below a predetermined threshold, it would request human intervention to avoid catastrophic effects.
  • Figure 2: Left) Conventional BNN neuron. Each weight uses GRNG to sample from a Guassian distribution, so the weight must store distribution properties $\mu$ and $\sigma$. Right) BNN fully-connected (FC) layers incur significant overhead from multiple memory operations and GRNG compared to standard FC layers.
  • Figure 3: CIM tile architecture, featuring two subarrays for separately computing $\mathbf{X}\sigma\epsilon$ and $\mathbf{X}\mu$. Both subarrays receive the same input $\mathbf{X}$, and downstream reduction logic recombines the results.
  • Figure 4: GRNG circuit and timing diagram. Thermal noise causes $C_n$ to discharge at a different rate than $C_p$, producing an ouput pulse E whose duration follows a 0--mean Gaussian distribution.
  • Figure 5: $\sigma\epsilon$ and $\mu$ CIM word circuits. The $\sigma\epsilon$ word contains additional switches to interface with the GRNG and produce a differential output. The $\mu$ word's output is differential because the data is stored differentially across 2 SRAM cells per bit.
  • ...and 7 more figures