Testing Spintronics Implemented Monte Carlo Dropout-Based Bayesian Neural Networks
Soyed Tuhin Ahmed, Michael Hefenbrock, Guillaume Prenat, Lorena Anghel, Mehdi B. Tahoori
TL;DR
This work tackles the reliability and testing of dropout-based Bayesian neural networks deployed on Spintronics-CIM, where stochastic dropout and hardware non-idealities challenge deterministic functional testing. It models non-idealities, introduces a repeatability ranking-based automatic test pattern generation framework, and develops a lightweight online fault-detection method that leverages a Gaussian uncertainty distribution with bounds $\mu \pm 3\sigma$. The authors demonstrate near-complete fault coverage for critical faults and conductance variations across SpinDrop, SpatialSpinDrop, and ScaleDrop on CIFAR-10 with ResNet-18, while requiring only $0.2\%$ of training data as test vectors. The proposed approach achieves high fault-detection efficiency and low false alarm rates, with detailed overhead and scalability analyses, offering a practical pathway for safe, in-field testing of BayNNs in spintronic hardware.
Abstract
Bayesian Neural Networks (BayNNs) can inherently estimate predictive uncertainty, facilitating informed decision-making. Dropout-based BayNNs are increasingly implemented in spintronics-based computation-in-memory architectures for resource-constrained yet high-performance safety-critical applications. Although uncertainty estimation is important, the reliability of Dropout generation and BayNN computation is equally important for target applications but is overlooked in existing works. However, testing BayNNs is significantly more challenging compared to conventional NNs, due to their stochastic nature. In this paper, we present for the first time the model of the non-idealities of the spintronics-based Dropout module and analyze their impact on uncertainty estimates and accuracy. Furthermore, we propose a testing framework based on repeatability ranking for Dropout-based BayNN with up to $100\%$ fault coverage while using only $0.2\%$ of training data as test vectors.
