PULSAR: Simultaneous Many-Row Activation for Reliable and High-Performance Computing in Off-the-Shelf DRAM Chips
Ismail Emir Yuksel, Yahya Can Tugrul, F. Nisa Bostanci, Abdullah Giray Yaglikci, Ataberk Olgun, Geraldo F. Oliveira, Melina Soysal, Haocong Luo, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu
TL;DR
PULSAR addresses the data-movement bottleneck in modern systems by enabling high-success-rate, high-performance Processing-using-Memory on unmodified DRAM chips. It achieves this through simultaneous activation of up to $32$ rows, input replication to boost MAJ reliability, and new PuM primitives (Multi-/RowInit, Many-Input Charge-Sharing, Bulk-Write) that unlock broader in-DRAM compute capabilities. Empirical evaluation on 120 DDR4 modules demonstrates a $24.18\%$ improvement in MAJ3 success rate and a $121\%$ performance boost over FracDRAM, with MAJ5/7/9 gains and faster cold-boot-content destruction up to $20.87\times$ over RowClone. The work shows that unmodified DRAM can support richer PuM primitives with meaningful performance benefits, suggesting practical potential for broader adoption and future manufacturer support.
Abstract
Data movement between the processor and the main memory is a first-order obstacle against improving performance and energy efficiency in modern systems. To address this obstacle, Processing-using-Memory (PuM) is a promising approach where bulk-bitwise operations are performed leveraging intrinsic analog properties within the DRAM array and massive parallelism across DRAM columns. Unfortunately, 1) modern off-the-shelf DRAM chips do not officially support PuM operations, and 2) existing techniques of performing PuM operations on off-the-shelf DRAM chips suffer from two key limitations. First, these techniques have low success rates, i.e., only a small fraction of DRAM columns can correctly execute PuM operations because they operate beyond manufacturer-recommended timing constraints, causing these operations to be highly susceptible to noise and process variation. Second, these techniques have limited compute primitives, preventing them from fully leveraging parallelism across DRAM columns and thus hindering their performance benefits. We propose PULSAR, a new technique to enable high-success-rate and high-performance PuM operations in off-the-shelf DRAM chips. PULSAR leverages our new observation that a carefully crafted sequence of DRAM commands simultaneously activates up to 32 DRAM rows. PULSAR overcomes the limitations of existing techniques by 1) replicating the input data to improve the success rate and 2) enabling new bulk bitwise operations (e.g., many-input majority, Multi-RowInit, and Bulk-Write) to improve the performance. Our analysis on 120 off-the-shelf DDR4 chips from two major manufacturers shows that PULSAR achieves a 24.18% higher success rate and 121% higher performance over seven arithmetic-logic operations compared to FracDRAM, a state-of-the-art off-the-shelf DRAM-based PuM technique.
