All-in-Memory Stochastic Computing using ReRAM
João Paulo C. de Lima, Mehran Shoushtari Moghadam, Sercan Aygun, Jeronimo Castrillon, M. Hassan Najafi, Asif Ali Khan
TL;DR
The paper presents an all-in-ReRAM in-memory stochastic computing pipeline that generates SBSs, executes SC, and converts results back to binary entirely within ReRAM arrays. By decoupling RNG from SBS generation and employing Scouting Logic for in-memory operations, it achieves end-to-end SC with robustness to ReRAM variability. Compared to CMOS-based and other ReRAM-based baselines, the proposed design delivers up to 2.16× throughput and 2.8× energy improvements, while incurring only about a 5% average drop in image processing quality under CIM faults. This approach reduces data movement and leverages the intrinsic variability of ReRAM as a feature rather than a flaw, enabling efficient, fault-tolerant in-memory computing for edge AI tasks.
Abstract
As the demand for efficient, low-power computing in embedded and edge devices grows, traditional computing methods are becoming less effective for handling complex tasks. Stochastic computing (SC) offers a promising alternative by approximating complex arithmetic operations, such as addition and multiplication, using simple bitwise operations, like majority or AND, on random bit-streams. While SC operations are inherently fault-tolerant, their accuracy largely depends on the length and quality of the stochastic bit-streams (SBS). These bit-streams are typically generated by CMOS-based stochastic bit-stream generators that consume over 80% of the SC system's power and area. Current SC solutions focus on optimizing the logic gates but often neglect the high cost of moving the bit-streams between memory and processor. This work leverages the physics of emerging ReRAM devices to implement the entire SC flow in place: (1) generating low-cost true random numbers and SBSs, (2) conducting SC operations, and (3) converting SBSs back to binary. Considering the low reliability of ReRAM cells, we demonstrate how SC's robustness to errors copes with ReRAM's variability. Our evaluation shows significant improvements in throughput (1.39x, 2.16x) and energy consumption (1.15x, 2.8x) over state-of-the-art (CMOS- and ReRAM-based) solutions, respectively, with an average image quality drop of 5% across multiple SBS lengths and image processing tasks.
