Table of Contents
Fetching ...

Accelerating Graph Neural Networks with a Novel Matrix Compression Format

João N. F. Alves, Samir Moustafa, Siegfried Benkner, Alexandre P. Francisco, Wilfried N. Gansterer, Luís M. S. Russo

TL;DR

This work tackles the bottleneck of repeated adjacency-based matrix multiplications in Graph Neural Networks by introducing the Compressed Binary Matrix (CBM) format, which encodes binary adjacency rows as deltas to similar rows via an MST-driven compression chain. By converting the binary structure into a real-valued matrix $\mathbf{A}'$ and leveraging fast high-performance SpMM kernels (e.g., Intel MKL) plus a delta-based topological update, CBM achieves substantial speedups while guaranteeing that the operation count does not exceed CSR-based costs in the worst case. The authors extend CBM to normalized adjacency matrices and propose edge-pruning with a tunable threshold $\alpha$ to balance compression and throughput, achieving up to $\approx 5\times$ SpMM speedups and up to $\approx 3\times$ acceleration in 2-layer GCN inference on several real-world datasets. The approach is integrated with PyTorch, maintains a reasonable preprocessing time (under 16 seconds on the largest tested graph), and is dataset-dependent, highlighting ongoing opportunities to optimize CBM across diverse GNN architectures and hardware like GPUs.

Abstract

The inference and training stages of Graph Neural Networks (GNNs) are often dominated by the time required to compute a long sequence of matrix multiplications between the sparse graph adjacency matrix and its embedding. To accelerate these stages, we first propose the Compressed Binary Matrix (CBM) storage format to succinctly represent the binary adjacency matrix of an unweighted graph. Then, we show how to generalize this representation to normalized adjacency matrices of unweighted graphs which arise in the context of GNNs. Finally, we develop efficient matrix multiplication kernels based on this compressed representation. The matrix multiplication kernels proposed in this work never require more scalar operations than classic sparse matrix multiplication algorithms. Experimental evaluation shows that the matrix multiplication strategies proposed outperform the current state-of-the-art implementations provided by Intel MKL, achieving speedups close to 5$\times$. Furthermore, our optimized matrix-multiplication strategies accelerated the inference time of a GNN by up to $3\times$.

Accelerating Graph Neural Networks with a Novel Matrix Compression Format

TL;DR

This work tackles the bottleneck of repeated adjacency-based matrix multiplications in Graph Neural Networks by introducing the Compressed Binary Matrix (CBM) format, which encodes binary adjacency rows as deltas to similar rows via an MST-driven compression chain. By converting the binary structure into a real-valued matrix and leveraging fast high-performance SpMM kernels (e.g., Intel MKL) plus a delta-based topological update, CBM achieves substantial speedups while guaranteeing that the operation count does not exceed CSR-based costs in the worst case. The authors extend CBM to normalized adjacency matrices and propose edge-pruning with a tunable threshold to balance compression and throughput, achieving up to SpMM speedups and up to acceleration in 2-layer GCN inference on several real-world datasets. The approach is integrated with PyTorch, maintains a reasonable preprocessing time (under 16 seconds on the largest tested graph), and is dataset-dependent, highlighting ongoing opportunities to optimize CBM across diverse GNN architectures and hardware like GPUs.

Abstract

The inference and training stages of Graph Neural Networks (GNNs) are often dominated by the time required to compute a long sequence of matrix multiplications between the sparse graph adjacency matrix and its embedding. To accelerate these stages, we first propose the Compressed Binary Matrix (CBM) storage format to succinctly represent the binary adjacency matrix of an unweighted graph. Then, we show how to generalize this representation to normalized adjacency matrices of unweighted graphs which arise in the context of GNNs. Finally, we develop efficient matrix multiplication kernels based on this compressed representation. The matrix multiplication kernels proposed in this work never require more scalar operations than classic sparse matrix multiplication algorithms. Experimental evaluation shows that the matrix multiplication strategies proposed outperform the current state-of-the-art implementations provided by Intel MKL, achieving speedups close to 5. Furthermore, our optimized matrix-multiplication strategies accelerated the inference time of a GNN by up to .
Paper Structure (21 sections, 2 theorems, 6 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 2 theorems, 6 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Lemma 2.1

Any matrix $\mathbf{A}\in\{0,1\}^{m \times n}$ can be represented in CBM format in $O((m+1)\ \mathbf{nnz}(\mathbf{A}) + m^2 \log m)$ time.

Figures (2)

  • Figure 1: Performance impact of different $\alpha$ values for the CBM format. These plots compare runtime reduction in sequential and parallel environments, along side with the compression rate across various $\alpha$ values. The x-axis lists the $\alpha$ values, the left y-axis shows runtime reduction relative to the original sparse-dense matrix product (SpMM), while the right y-axis shows the compression ratio against the dataset size in CSR.
  • Figure 2: Performance impact of integrating the CBM format and corresponding sparse-dense matrix multiplication kernel (SpMM) in the inference stage of a 2-layer GCN across the different datasets. The y-axis shows the runtime reduction achieved in GNN inference by matrix multiplication with CBM against the matrix multiplication kernel with CSR provided by Intel MKL.

Theorems & Definitions (4)

  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof