STT-RAM-based Hierarchical In-Memory Computing
Dhruv Gajaria, Kevin Antony Gomez, Tosiron Adegbija
TL;DR
This work introduces Hierarchical In-Memory Computing (HiMC), combining relaxed-retention STT-RAM-based PiC at the cache level with non-volatile STT-RAM PiM in main memory to minimize data movement and energy. It shows that PiC can deliver substantial latency and energy benefits for CPU-dependent workloads, while PiM remains advantageous for CPU-independent workloads, and demonstrates that a heterogeneous two-retention cache design can further optimize overall performance. The authors develop an architectural framework including a retention-time monitor, operation chaining, and PiC/PiM management, and provide a Thorough evaluation across eight workloads using validated simulation tools, yielding up to multi-fold speedups and significant area reductions compared to SRAM. The study highlights open research challenges in bit-line computing, compiler/hardware co-design, and data-flow-like architectures, setting path for scalable, energy-efficient in-memory computing in resource-constrained systems.
Abstract
In-memory computing promises to overcome the von Neumann bottleneck in computer systems by performing computations directly within the memory. Previous research has suggested using Spin-Transfer Torque RAM (STT-RAM) for in-memory computing due to its non-volatility, low leakage power, high density, endurance, and commercial viability. This paper explores hierarchical in-memory computing, where different levels of the memory hierarchy are augmented with processing elements to optimize workload execution. The paper investigates processing in memory (PiM) using non-volatile STT-RAM and processing in cache (PiC) using volatile STT-RAM with relaxed retention, which helps mitigate STT-RAM's write latency and energy overheads. We analyze tradeoffs and overheads associated with data movement for PiC versus write overheads for PiM using STT-RAMs for various workloads. We examine workload characteristics, such as computational intensity and CPU-dependent workloads with limited instruction-level parallelism, and their impact on PiC/PiM tradeoffs. Using these workloads, we evaluate computing in STT-RAM versus SRAM at different cache hierarchy levels and explore the potential of heterogeneous STT-RAM cache architectures with various retention times for PiC and CPU-based computing. Our experiments reveal significant advantages of STT-RAM-based PiC over PiM for specific workloads. Finally, we describe open research problems in hierarchical in-memory computing architectures to further enhance this paradigm.
