Table of Contents
Fetching ...

PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures

Christina Giannoula, Peiming Yang, Ivan Fernandez, Jiacheng Yang, Sankeerth Durvasula, Yu Xin Li, Mohammad Sadrosadati, Juan Gomez Luna, Onur Mutlu, Gennady Pekhimenko

TL;DR

PyGim tackles the memory-bandwidth bottleneck of GNN aggregation by co-designing a real-world near-bank PIM workflow where the aggregation step runs on PIM cores and the compute-heavy combination runs on the Host. It introduces Cooperative Acceleration (CoA) and Parallelism Fusion (PaF) to dynamically map and balance sparse SpMM and dense GEMMs across heterogeneous hardware, aided by a lightweight autotuner that predicts the best aggregation configuration. On a real UPMEM system with 1992 PIM cores, PyGim achieves substantial gains over CPU and GPU baselines (e.g., ~3x–4x end-to-end inference speedups and up to ~11.6x higher PIM utilization) and outperforms prior PIM schemes by several-fold in both performance and energy efficiency. The work provides practical insights and design guidelines for software, hardware, and hardware-software co-design to unlock memory-centric accelerations for sparse ML workloads, and releases an open-source PyGim library for broader adoption. The combination of a flexible CoA/PaF framework, tunable configuration, and real-system validation demonstrates the viability of memory-centric GNN processing in near-bank PIM architectures and informs future hardware prototyping and software stacks.

Abstract

Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML library that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim is publicly available at https://github.com/CMU-SAFARI/PyGim.

PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures

TL;DR

PyGim tackles the memory-bandwidth bottleneck of GNN aggregation by co-designing a real-world near-bank PIM workflow where the aggregation step runs on PIM cores and the compute-heavy combination runs on the Host. It introduces Cooperative Acceleration (CoA) and Parallelism Fusion (PaF) to dynamically map and balance sparse SpMM and dense GEMMs across heterogeneous hardware, aided by a lightweight autotuner that predicts the best aggregation configuration. On a real UPMEM system with 1992 PIM cores, PyGim achieves substantial gains over CPU and GPU baselines (e.g., ~3x–4x end-to-end inference speedups and up to ~11.6x higher PIM utilization) and outperforms prior PIM schemes by several-fold in both performance and energy efficiency. The work provides practical insights and design guidelines for software, hardware, and hardware-software co-design to unlock memory-centric accelerations for sparse ML workloads, and releases an open-source PyGim library for broader adoption. The combination of a flexible CoA/PaF framework, tunable configuration, and real-system validation demonstrates the viability of memory-centric GNN processing in near-bank PIM architectures and informs future hardware prototyping and software stacks.

Abstract

Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML library that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim is publicly available at https://github.com/CMU-SAFARI/PyGim.
Paper Structure (29 sections, 22 figures, 9 tables, 2 algorithms)

This paper contains 29 sections, 22 figures, 9 tables, 2 algorithms.

Figures (22)

  • Figure 1: Overview of the GNN layer execution workflow.
  • Figure 1: PyGim tuner for the aggregation operator.
  • Figure 2: Roofline model in the NVIDIA RTX 3090 GPU for aggregation and combination kernels.
  • Figure 2: Example of GCN execution with PyGim API.
  • Figure 3: Overview of a real near-bank PIM system. Host has access to $m$ standard and $n$ PIM-enabled modules.
  • ...and 17 more figures