A Modern Primer on Processing in Memory
Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, Rachata Ausavarungnirun, Mohammad Sadrosadati, Geraldo F. Oliveira
TL;DR
This work articulates a data-centric shift to overcome the dominant data-movement bottleneck in modern systems by advancing Processing-in-Memory (PIM). It surveys two main approaches, Processing-Using-Memory (PUM) and Processing-Near-Memory (PNM), enabled by 3D-stacked memory and non-volatile memories, and details a broad spectrum of substrates, from RowClone and SIMDRAM to Tesseract and NATSA. The paper catalogs cross-layer research spanning devices, architectures, runtimes, programming models, and security, and presents real hardware prototypes (UPMEM, FIMDRAM, AxDIMM, AiM) and storage-centric systems (MegIS). It emphasizes challenges in programming models, coherence, and virtual memory while showcasing substantial benefits in performance and energy, with exemplary gains in graph processing, HTAP, time-series analysis, and genome work-flows. Ultimately, it argues for a principled, data-centric future where computation moves toward data, enabling energy-efficient, scalable, and sustainable systems across domains.
Abstract
This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or modules, in the logic layer of 3D-stacked memory, in the memory controllers, in storage devices or chips), so that data movement between the computation units and memory/storage units is reduced or eliminated. While the general idea of PIM is not new, we discuss motivating trends in applications as well as memory circuits and technology that greatly exacerbate the need for enabling it in modern computing systems. We examine at least two promising new approaches to designing PIM systems to accelerate important data-intensive applications: (1) processing-using-memory, which exploits fundamental analog operational principles of memory chips to perform massively-parallel operations in-situ in memory, (2) processing-near-memory, which exploits different logic and memory integration technologies (e.g., 3D-stacked memory technology) to place computation logic close to memory circuitry, and thereby enable high-bandwidth, low-energy, and low-latency access to data. In both approaches, we describe and tackle relevant cross-layer research, design, and adoption challenges in devices, architecture, systems, compilers, programming models, and applications. Our focus is on the development of PIM designs that can be adopted in real computing platforms at low cost. We conclude by discussing work on solving key challenges to the practical adoption of PIM. We believe that the shift from a processor-centric to a memory-centric mindset (and infrastructure) remains the largest adoption challenge for PIM, which, once overcome, can unleash a fundamentally energy-efficient, high-performance, and sustainable new way of designing, using, and programming computing systems.
