Table of Contents
Fetching ...

All-in-Memory Stochastic Computing using ReRAM

João Paulo C. de Lima, Mehran Shoushtari Moghadam, Sercan Aygun, Jeronimo Castrillon, M. Hassan Najafi, Asif Ali Khan

TL;DR

The paper presents an all-in-ReRAM in-memory stochastic computing pipeline that generates SBSs, executes SC, and converts results back to binary entirely within ReRAM arrays. By decoupling RNG from SBS generation and employing Scouting Logic for in-memory operations, it achieves end-to-end SC with robustness to ReRAM variability. Compared to CMOS-based and other ReRAM-based baselines, the proposed design delivers up to 2.16× throughput and 2.8× energy improvements, while incurring only about a 5% average drop in image processing quality under CIM faults. This approach reduces data movement and leverages the intrinsic variability of ReRAM as a feature rather than a flaw, enabling efficient, fault-tolerant in-memory computing for edge AI tasks.

Abstract

As the demand for efficient, low-power computing in embedded and edge devices grows, traditional computing methods are becoming less effective for handling complex tasks. Stochastic computing (SC) offers a promising alternative by approximating complex arithmetic operations, such as addition and multiplication, using simple bitwise operations, like majority or AND, on random bit-streams. While SC operations are inherently fault-tolerant, their accuracy largely depends on the length and quality of the stochastic bit-streams (SBS). These bit-streams are typically generated by CMOS-based stochastic bit-stream generators that consume over 80% of the SC system's power and area. Current SC solutions focus on optimizing the logic gates but often neglect the high cost of moving the bit-streams between memory and processor. This work leverages the physics of emerging ReRAM devices to implement the entire SC flow in place: (1) generating low-cost true random numbers and SBSs, (2) conducting SC operations, and (3) converting SBSs back to binary. Considering the low reliability of ReRAM cells, we demonstrate how SC's robustness to errors copes with ReRAM's variability. Our evaluation shows significant improvements in throughput (1.39x, 2.16x) and energy consumption (1.15x, 2.8x) over state-of-the-art (CMOS- and ReRAM-based) solutions, respectively, with an average image quality drop of 5% across multiple SBS lengths and image processing tasks.

All-in-Memory Stochastic Computing using ReRAM

TL;DR

The paper presents an all-in-ReRAM in-memory stochastic computing pipeline that generates SBSs, executes SC, and converts results back to binary entirely within ReRAM arrays. By decoupling RNG from SBS generation and employing Scouting Logic for in-memory operations, it achieves end-to-end SC with robustness to ReRAM variability. Compared to CMOS-based and other ReRAM-based baselines, the proposed design delivers up to 2.16× throughput and 2.8× energy improvements, while incurring only about a 5% average drop in image processing quality under CIM faults. This approach reduces data movement and leverages the intrinsic variability of ReRAM as a feature rather than a flaw, enabling efficient, fault-tolerant in-memory computing for edge AI tasks.

Abstract

As the demand for efficient, low-power computing in embedded and edge devices grows, traditional computing methods are becoming less effective for handling complex tasks. Stochastic computing (SC) offers a promising alternative by approximating complex arithmetic operations, such as addition and multiplication, using simple bitwise operations, like majority or AND, on random bit-streams. While SC operations are inherently fault-tolerant, their accuracy largely depends on the length and quality of the stochastic bit-streams (SBS). These bit-streams are typically generated by CMOS-based stochastic bit-stream generators that consume over 80% of the SC system's power and area. Current SC solutions focus on optimizing the logic gates but often neglect the high cost of moving the bit-streams between memory and processor. This work leverages the physics of emerging ReRAM devices to implement the entire SC flow in place: (1) generating low-cost true random numbers and SBSs, (2) conducting SC operations, and (3) converting SBSs back to binary. Considering the low reliability of ReRAM cells, we demonstrate how SC's robustness to errors copes with ReRAM's variability. Our evaluation shows significant improvements in throughput (1.39x, 2.16x) and energy consumption (1.15x, 2.8x) over state-of-the-art (CMOS- and ReRAM-based) solutions, respectively, with an average image quality drop of 5% across multiple SBS lengths and image processing tasks.

Paper Structure

This paper contains 14 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: A high-level overview of our proposed in-memory SC solution: (a) ReRAM array, (b) Greater-than operation using basic logic gates, (c) Write latches in the peripheral circuitry.
  • Figure 2: In-ReRAM SBS generation and SC arithmetic operations. CORDIV CORDIV is considered for SC division with $X\!\leqslant\!Y$. Addition with OR; the inputs are in the $[0,0.5]$ interval to not exceed $1.0$ in the output.
  • Figure 3: SC image processing applications: (a) Image Compositing, merging background and foreground images with $\alpha$ channel. (b) Bilinear Interpolation, up-scaling input images. (c) Image Matting, parsing $\alpha$ channel for the foreground object to separate the background.
  • Figure 4: Normalized energy savings for CMOS (✛) and ReRAM (✦) designs.
  • Figure 5: Normalized throughput for CMOS (✛) and ReRAM (✦) designs.