Table of Contents
Fetching ...

PIM-STM: Software Transactional Memory for Processing-In-Memory Systems

André Lopes, Daniel Castro, Paolo Romano

TL;DR

The paper tackles the data movement bottleneck in data-intensive workloads by introducing PIM-STM, a software transactional memory library tailored for Processing-In-Memory systems (specifically UPMEM). It systematically studies a design space of STM implementations (metadata granularity, read visibility, lock timing, and write policy) and evaluates the impact of PIM memory tiers (WRAM vs MRAM) on metadata management, demonstrating that NOrec-like designs are generally robust but there is no universal best solution. Through porting CPUs TM benchmarks (KMeans, Labyrinth) and conducting single- and multi-DPU experiments, the work reports up to $14.53\times$ speedups and up to $5\times$ energy gains, while also identifying scenarios with energy penalties and highlighting the importance of workload characteristics. Overall, PIM-STM provides practical guidance for developers to choose STM designs on PIM and validates the potential of accelerating CPU TM workloads on UPMEM hardware, paving the way for broader TM-enabled PIM applications.

Abstract

Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem that affects many modern workloads. This work tackles the problem of how to build efficient software implementations of the Transactional Memory (TM) abstraction by introducing PIM-STM, a library that provides a range of diverse TM implementations for UPMEM, the first commercial PIM system. Via an extensive study we assess the efficiency of alternative choices in the design space of TM algorithms on this emerging architecture. We further quantify the impact of using different memory tiers of the UPMEM system (having different trade-offs for what concerns latency vs capacity) to store the metadata used by different TM implementations. Finally, we assess the gains achievable in terms of performance and memory efficiency when using PIM-STM to accelerate TM applications originally conceived for conventional CPU-based systems.

PIM-STM: Software Transactional Memory for Processing-In-Memory Systems

TL;DR

The paper tackles the data movement bottleneck in data-intensive workloads by introducing PIM-STM, a software transactional memory library tailored for Processing-In-Memory systems (specifically UPMEM). It systematically studies a design space of STM implementations (metadata granularity, read visibility, lock timing, and write policy) and evaluates the impact of PIM memory tiers (WRAM vs MRAM) on metadata management, demonstrating that NOrec-like designs are generally robust but there is no universal best solution. Through porting CPUs TM benchmarks (KMeans, Labyrinth) and conducting single- and multi-DPU experiments, the work reports up to speedups and up to energy gains, while also identifying scenarios with energy penalties and highlighting the importance of workload characteristics. Overall, PIM-STM provides practical guidance for developers to choose STM designs on PIM and validates the potential of accelerating CPU TM workloads on UPMEM hardware, paving the way for broader TM-enabled PIM applications.

Abstract

Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem that affects many modern workloads. This work tackles the problem of how to build efficient software implementations of the Transactional Memory (TM) abstraction by introducing PIM-STM, a library that provides a range of diverse TM implementations for UPMEM, the first commercial PIM system. Via an extensive study we assess the efficiency of alternative choices in the design space of TM algorithms on this emerging architecture. We further quantify the impact of using different memory tiers of the UPMEM system (having different trade-offs for what concerns latency vs capacity) to store the metadata used by different TM implementations. Finally, we assess the gains achievable in terms of performance and memory efficiency when using PIM-STM to accelerate TM applications originally conceived for conventional CPU-based systems.
Paper Structure (51 sections, 10 figures)

This paper contains 51 sections, 10 figures.

Figures (10)

  • Figure 1: Internal depiction of an UPMEM PIM chip gomez2021benchmarking.
  • Figure 2: STM taxonomy. The designs in dashed boxes are either impossible to implement or impractical.
  • Figure 3: Design of lock table
  • Figure 4: Throughput, abort rate and time breakdown for ArrayBench and Linked-List with metadata in MRAM.
  • Figure 5: Throughput, abort rate and time breakdown for the KMeans and Labyrinth benchmark with metadata in MRAM.
  • ...and 5 more figures