Error Detection and Correction Codes for Safe In-Memory Computations
Luca Parrini, Taha Soliman, Benjamin Hettwer, Jan Micha Borrmann, Simranjeet Singh, Ankit Bende, Vikas Rana, Farhad Merchant, Norbert Wehn
TL;DR
The paper tackles accuracy loss in in-memory computing (IMC) AI accelerators caused by device non-idealities, which is critical for safety-critical applications. It introduces two neural checksum blocks implemented at the crossbar and PE levels, plus an IMC Error Detection and Correction Routine (IEDCR) to detect and correct arithmetic errors in real time, aiming to be more accuracy-driven and hardware-efficient than traditional methods like Triple Modular Redundancy (TMR). Results across FeFET and RRAM technologies demonstrate that the approach can recover a large portion of original NN accuracy (often >91% and up to ~95%), while incurring significantly lower area overhead and flexible latency overhead depending on configuration. The method is validated on CIFAR-10 with ResNet variants and NiN, highlighting its scalability, technology-agnostic applicability, and potential as a practical safety mechanism for in-memory neural computation.
Abstract
In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. However, the non-idealities and defects of emerging technologies used in advanced IMC can severely degrade the accuracy of inferred Neural Networks (NN) and lead to malfunctions in safety-critical applications. In this paper, we investigate an architectural-level mitigation technique based on the coordinated action of multiple checksum codes, to detect and correct errors at run-time. This implementation demonstrates higher efficiency in recovering accuracy across different AI algorithms and technologies compared to more traditional methods such as Triple Modular Redundancy (TMR). The results show that several configurations of our implementation recover more than 91% of the original accuracy with less than half of the area required by TMR and less than 40% of latency overhead.
