Table of Contents
Fetching ...

GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization

Stefan Abi-Karam, Cong Hao

TL;DR

GNNBuilder presents a generic, end-to-end framework for automated GNN accelerator generation, simulation, optimization, and FPGA deployment with a PyTorch-centric workflow. It supports a wide range of GNN models through an explicit message-passing hardware dataflow, and provides a template-based code generator, testbenches, and a fast performance model for design space exploration. Key results show a latency prediction MAPE of $36\%$ and BRAM prediction MAPE of $17\%$, with accelerators achieving approximately $6.33\times$ speedup over PyG-CPU and $6.87\times$ over PyG-GPU across multiple datasets. The work enables push-button hardware acceleration for diverse GNN architectures and enables real-time co-design, with open-source availability for practitioners and researchers.

Abstract

There are plenty of graph neural network (GNN) accelerators being proposed. However, they highly rely on users' hardware expertise and are usually optimized for one specific GNN model, making them challenging for practical use. Therefore, in this work, we propose GNNBuilder, the first automated, generic, end-to-end GNN accelerator generation framework. It features four advantages: (1) GNNBuilder can automatically generate GNN accelerators for a wide range of GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch programming interface, introducing zero overhead for algorithm developers; (3) GNNBuilder supports end-to-end code generation, simulation, accelerator optimization, and hardware deployment, realizing a push-button fashion for GNN accelerator design; (4) GNNBuilder is equipped with accurate performance models of its generated accelerator, enabling fast and flexible design space exploration (DSE). In the experiments, first, we show that our accelerator performance model has errors within $36\%$ for latency prediction and $18\%$ for BRAM count prediction. Second, we show that our generated accelerators can outperform CPU by $6.33\times$ and GPU by $6.87\times$. This framework is open-source, and the code is available at https://github.com/sharc-lab/gnn-builder.

GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization

TL;DR

GNNBuilder presents a generic, end-to-end framework for automated GNN accelerator generation, simulation, optimization, and FPGA deployment with a PyTorch-centric workflow. It supports a wide range of GNN models through an explicit message-passing hardware dataflow, and provides a template-based code generator, testbenches, and a fast performance model for design space exploration. Key results show a latency prediction MAPE of and BRAM prediction MAPE of , with accelerators achieving approximately speedup over PyG-CPU and over PyG-GPU across multiple datasets. The work enables push-button hardware acceleration for diverse GNN architectures and enables real-time co-design, with open-source availability for practitioners and researchers.

Abstract

There are plenty of graph neural network (GNN) accelerators being proposed. However, they highly rely on users' hardware expertise and are usually optimized for one specific GNN model, making them challenging for practical use. Therefore, in this work, we propose GNNBuilder, the first automated, generic, end-to-end GNN accelerator generation framework. It features four advantages: (1) GNNBuilder can automatically generate GNN accelerators for a wide range of GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch programming interface, introducing zero overhead for algorithm developers; (3) GNNBuilder supports end-to-end code generation, simulation, accelerator optimization, and hardware deployment, realizing a push-button fashion for GNN accelerator design; (4) GNNBuilder is equipped with accurate performance models of its generated accelerator, enabling fast and flexible design space exploration (DSE). In the experiments, first, we show that our accelerator performance model has errors within for latency prediction and for BRAM count prediction. Second, we show that our generated accelerators can outperform CPU by and GPU by . This framework is open-source, and the code is available at https://github.com/sharc-lab/gnn-builder.
Paper Structure (31 sections, 1 equation, 7 figures, 4 tables)

This paper contains 31 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Workflow of the GNNBuilder framework.
  • Figure 2: The GNNBuilder model architecture for graph level tasks.
  • Figure 3: The high-level hardware kernel architecture for GNNConv layers.
  • Figure 4: Comparison of latency prediction models with true post-synthesis latency and BRAM usage reported from Vitis HLS
  • Figure 5: Cumulative runtime for evaluating 400 design to predict model latency and BRAM usage. The x-axis represents time going forward from left to right, and each point represents a performance estimate which has finished computing.
  • ...and 2 more figures