Table of Contents
Fetching ...

A Modern Primer on Processing in Memory

Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, Rachata Ausavarungnirun, Mohammad Sadrosadati, Geraldo F. Oliveira

TL;DR

This work articulates a data-centric shift to overcome the dominant data-movement bottleneck in modern systems by advancing Processing-in-Memory (PIM). It surveys two main approaches, Processing-Using-Memory (PUM) and Processing-Near-Memory (PNM), enabled by 3D-stacked memory and non-volatile memories, and details a broad spectrum of substrates, from RowClone and SIMDRAM to Tesseract and NATSA. The paper catalogs cross-layer research spanning devices, architectures, runtimes, programming models, and security, and presents real hardware prototypes (UPMEM, FIMDRAM, AxDIMM, AiM) and storage-centric systems (MegIS). It emphasizes challenges in programming models, coherence, and virtual memory while showcasing substantial benefits in performance and energy, with exemplary gains in graph processing, HTAP, time-series analysis, and genome work-flows. Ultimately, it argues for a principled, data-centric future where computation moves toward data, enabling energy-efficient, scalable, and sustainable systems across domains.

Abstract

This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or modules, in the logic layer of 3D-stacked memory, in the memory controllers, in storage devices or chips), so that data movement between the computation units and memory/storage units is reduced or eliminated. While the general idea of PIM is not new, we discuss motivating trends in applications as well as memory circuits and technology that greatly exacerbate the need for enabling it in modern computing systems. We examine at least two promising new approaches to designing PIM systems to accelerate important data-intensive applications: (1) processing-using-memory, which exploits fundamental analog operational principles of memory chips to perform massively-parallel operations in-situ in memory, (2) processing-near-memory, which exploits different logic and memory integration technologies (e.g., 3D-stacked memory technology) to place computation logic close to memory circuitry, and thereby enable high-bandwidth, low-energy, and low-latency access to data. In both approaches, we describe and tackle relevant cross-layer research, design, and adoption challenges in devices, architecture, systems, compilers, programming models, and applications. Our focus is on the development of PIM designs that can be adopted in real computing platforms at low cost. We conclude by discussing work on solving key challenges to the practical adoption of PIM. We believe that the shift from a processor-centric to a memory-centric mindset (and infrastructure) remains the largest adoption challenge for PIM, which, once overcome, can unleash a fundamentally energy-efficient, high-performance, and sustainable new way of designing, using, and programming computing systems.

A Modern Primer on Processing in Memory

TL;DR

This work articulates a data-centric shift to overcome the dominant data-movement bottleneck in modern systems by advancing Processing-in-Memory (PIM). It surveys two main approaches, Processing-Using-Memory (PUM) and Processing-Near-Memory (PNM), enabled by 3D-stacked memory and non-volatile memories, and details a broad spectrum of substrates, from RowClone and SIMDRAM to Tesseract and NATSA. The paper catalogs cross-layer research spanning devices, architectures, runtimes, programming models, and security, and presents real hardware prototypes (UPMEM, FIMDRAM, AxDIMM, AiM) and storage-centric systems (MegIS). It emphasizes challenges in programming models, coherence, and virtual memory while showcasing substantial benefits in performance and energy, with exemplary gains in graph processing, HTAP, time-series analysis, and genome work-flows. Ultimately, it argues for a principled, data-centric future where computation moves toward data, enabling energy-efficient, scalable, and sustainable systems across domains.

Abstract

This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or modules, in the logic layer of 3D-stacked memory, in the memory controllers, in storage devices or chips), so that data movement between the computation units and memory/storage units is reduced or eliminated. While the general idea of PIM is not new, we discuss motivating trends in applications as well as memory circuits and technology that greatly exacerbate the need for enabling it in modern computing systems. We examine at least two promising new approaches to designing PIM systems to accelerate important data-intensive applications: (1) processing-using-memory, which exploits fundamental analog operational principles of memory chips to perform massively-parallel operations in-situ in memory, (2) processing-near-memory, which exploits different logic and memory integration technologies (e.g., 3D-stacked memory technology) to place computation logic close to memory circuitry, and thereby enable high-bandwidth, low-energy, and low-latency access to data. In both approaches, we describe and tackle relevant cross-layer research, design, and adoption challenges in devices, architecture, systems, compilers, programming models, and applications. Our focus is on the development of PIM designs that can be adopted in real computing platforms at low cost. We conclude by discussing work on solving key challenges to the practical adoption of PIM. We believe that the shift from a processor-centric to a memory-centric mindset (and infrastructure) remains the largest adoption challenge for PIM, which, once overcome, can unleash a fundamentally energy-efficient, high-performance, and sustainable new way of designing, using, and programming computing systems.

Paper Structure

This paper contains 53 sections, 61 figures, 1 table.

Figures (61)

  • Figure 1: Overview of DRAM technology scaling over more than five decades.
  • Figure 2: The relative failure rate for servers using DRAM chips with different densities. Higher density chips (related to newer technology nodes) correlate with higher server failure rates. Reproduced from mutlu.accml23.talk. Originally presented in meza.dsn15.
  • Figure 3: RowHammer vulnerability for DRAM modules manufactured between 2008 and 2014. Reproduced from mutlu.nsfpim20. Originally presented in kim-isca2014kim.isca2014talk.
  • Figure 4: Fundamental difference between RowHammer and RowPress (i.e., how long an aggressor row is kept open) and the resulting effect on the number of activations required to induce a bitflip. Reproduced from mutlu.njit2023talk.
  • Figure 5: The minimum number of total aggressor row activations to cause at least one bitflip ($AC_{min}$) distributions of conventional RowHammer and three representative cases of RowPress at 80 with one (single-sided) and two (double-sided) aggressor row(s) across 164 DDR4 chips from manufacturers S, H, and M (i.e., Samsung, SK Hynix, Micron). Reproduced from luo2023rowpress.
  • ...and 56 more figures