Table of Contents
Fetching ...

DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures

Geraldo F. Oliveira, Alain Kohli, David Novo, Ataberk Olgun, A. Giray Yaglikci, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

TL;DR

The DaPPA paper tackles the programming burden of processing-in-memory on UPMEM by proposing a data-parallel pattern-based framework that automatically distributes work, manages data movement, and compiles to optimized DPUs code. Centered on a high-level skeleton-like approach, DaPPA provides five data-parallel primitives, a dataflow Pipeline interface, and a dynamic template-based compiler to generate UPMEM binaries without exposing hardware details to the programmer. The key contributions are (i) the first data-parallel pattern-based abstraction for UPMEM, (ii) automatic input/output distribution and workload partitioning across DPUs, (iii) compiler-driven optimizations and code transformations that bridge patterns to UPMEM code, and (iv) substantial productivity and performance gains demonstrated on PrIM workloads (average 2.1× end-to-end speedup and 94% LOC reduction). The results indicate that DaPPA enables efficient and developer-friendly programming for UPMEM PIM systems and could catalyze broader adoption of PIM by reducing programming complexity while delivering competitive performance.

Abstract

The growing volume of data in modern applications has led to significant computational costs in conventional processor-centric systems. Processing-in-memory (PIM) architectures alleviate these costs by moving computation closer to memory, reducing data movement overheads. UPMEM is the first commercially available PIM system, featuring thousands of in-order processors (DPUs) integrated within DRAM modules. However, a programming UPMEM-based system remains challenging due to the need for explicit data management and workload partitioning across DPUs. We introduce DaPPA (data-parallel processing-in-memory architecture), a programming framework that eases the programmability of UPMEM systems by automatically managing data movement, memory allocation, and workload distribution. The key idea behind DaPPA is to leverage a high-level data-parallel pattern-based programming interface to abstract hardware complexities away from the programmer. DaPPA comprises three main components: (i) data-parallel pattern APIs, a collection of five primary data-parallel pattern primitives that allow the programmer to express data transformations within an application; (ii) a dataflow programming interface, which allows the programmer to define how data moves across data-parallel patterns; and (iii) a dynamic template-based compilation, which leverages code skeletons and dynamic code transformations to convert data-parallel patterns implemented via the dataflow programming interface into an optimized UPMEM binary. We evaluate DaPPA using six workloads from the PrIM benchmark suite on a real UPMEM system. Compared to hand-tuned implementations, DaPPA improves end-to-end performance by 2.1x, on average, and reduces programming complexity (measured in lines-of-code) by 94%. Our results demonstrate that DaPPA is an effective programming framework for efficient and user-friendly programming on UPMEM systems.

DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures

TL;DR

The DaPPA paper tackles the programming burden of processing-in-memory on UPMEM by proposing a data-parallel pattern-based framework that automatically distributes work, manages data movement, and compiles to optimized DPUs code. Centered on a high-level skeleton-like approach, DaPPA provides five data-parallel primitives, a dataflow Pipeline interface, and a dynamic template-based compiler to generate UPMEM binaries without exposing hardware details to the programmer. The key contributions are (i) the first data-parallel pattern-based abstraction for UPMEM, (ii) automatic input/output distribution and workload partitioning across DPUs, (iii) compiler-driven optimizations and code transformations that bridge patterns to UPMEM code, and (iv) substantial productivity and performance gains demonstrated on PrIM workloads (average 2.1× end-to-end speedup and 94% LOC reduction). The results indicate that DaPPA enables efficient and developer-friendly programming for UPMEM PIM systems and could catalyze broader adoption of PIM by reducing programming complexity while delivering competitive performance.

Abstract

The growing volume of data in modern applications has led to significant computational costs in conventional processor-centric systems. Processing-in-memory (PIM) architectures alleviate these costs by moving computation closer to memory, reducing data movement overheads. UPMEM is the first commercially available PIM system, featuring thousands of in-order processors (DPUs) integrated within DRAM modules. However, a programming UPMEM-based system remains challenging due to the need for explicit data management and workload partitioning across DPUs. We introduce DaPPA (data-parallel processing-in-memory architecture), a programming framework that eases the programmability of UPMEM systems by automatically managing data movement, memory allocation, and workload distribution. The key idea behind DaPPA is to leverage a high-level data-parallel pattern-based programming interface to abstract hardware complexities away from the programmer. DaPPA comprises three main components: (i) data-parallel pattern APIs, a collection of five primary data-parallel pattern primitives that allow the programmer to express data transformations within an application; (ii) a dataflow programming interface, which allows the programmer to define how data moves across data-parallel patterns; and (iii) a dynamic template-based compilation, which leverages code skeletons and dynamic code transformations to convert data-parallel patterns implemented via the dataflow programming interface into an optimized UPMEM binary. We evaluate DaPPA using six workloads from the PrIM benchmark suite on a real UPMEM system. Compared to hand-tuned implementations, DaPPA improves end-to-end performance by 2.1x, on average, and reduces programming complexity (measured in lines-of-code) by 94%. Our results demonstrate that DaPPA is an effective programming framework for efficient and user-friendly programming on UPMEM systems.
Paper Structure (7 sections, 2 figures)

This paper contains 7 sections, 2 figures.

Figures (2)

  • Figure 1: UPMEM system organization.
  • Figure 2: Overview of the DaPPA programming framework.