A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation
Zephan M. Enciso, Boyang Cheng, Likai Pei, Jianbo Liu, Steven Davis, Michael Niemier, Ningyuan Cao
TL;DR
This paper tackles the challenge of uncertainty estimation in safety-critical AI by presenting a 65 nm ASIC that tightly integrates a 360 fJ/Sample in-word Gaussian RNG into SRAM to enable fully-parallel compute-in-memory for Bayesian neural networks. The approach combines partial-BNN weight structuring with a weight-decomposition scheme $w_{(i,j)} = \mu_{(i,j)} + \sigma_{(i,j)}\epsilon$, where $\epsilon \sim \mathcal{N}(0,1)$, and two CIM subarrays to compute $\mathbf{X}\mu$ and $\mathbf{X}(\sigma\epsilon)$ in parallel. A novel in-word GRNG based on capacitor thermal noise generates Gaussian samples directly inside memory, while calibration mitigates static variation errors. Experimental results on a fabricated 65 nm chip show 5.12 GSa/s RNG throughput and 102 GOp/s NN throughput in a 0.45 mm^2 die, with RNG output exhibiting near-Gaussian normality ($r$ ≈ 0.997) and favorable energy-delay characteristics. The work improves uncertainty estimation metrics (lower ECE, higher predictive entropy for errors) and demonstrates a practical path for robust, energy-efficient edge AI with embedded stochastic sampling.
Abstract
Uncertainty estimation is an indispensable capability for AI-enabled, safety-critical applications, e.g. autonomous vehicles or medical diagnosis. Bayesian neural networks (BNNs) use Bayesian statistics to provide both classification predictions and uncertainty estimation, but they suffer from high computational overhead associated with random number generation and repeated sample iterations. Furthermore, BNNs are not immediately amenable to acceleration through compute-in-memory architectures due to the frequent memory writes necessary after each RNG operation. To address these challenges, we present an ASIC that integrates 360 fJ/Sample Gaussian RNG directly into the SRAM memory words. This integration reduces RNG overhead and enables fully-parallel compute-in-memory operations for BNNs. The prototype chip achieves 5.12 GSa/s RNG throughput and 102 GOp/s neural network throughput while occupying 0.45 mm2, bringing AI uncertainty estimation to edge computation.
