Table of Contents
Fetching ...

A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

Yuxi Hong, Aydin Buluc

TL;DR

This work introduces a sparsity-aware 1D SpGEMM algorithm for distributed-memory systems that fetches only the A data blocks needed for local computation using MPI RDMA and a block-fetch strategy. By preserving the original sparsity structure and optionally applying graph partitioning, it reduces communication significantly compared to sparsity-oblivious 2D/3D approaches, and it achieves strong scalability on real-world sparse matrices. The method is implemented in CombBLAS with MPI+OpenMP and demonstrates substantial performance advantages in squaring, Galerkin-like restriction operations, and betweenness centrality workloads, particularly when partitioning is well-chosen. The paper also provides practical guidance on when to apply graph partitioning versus random permutation and discusses integration with existing solvers like PETSc and Trilinos, highlighting the approach as a high-performance primitive for SpGEMM-related applications.

Abstract

Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory parallel algorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that use 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically reduce communication by not fetching nonzeros of the sparse matrices that do not participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation. It uses MPI RDMA operations to mitigate the cost of packing/unpacking submatrices for communication, and it uses a block fetching strategy to avoid excessive fine-grained messaging. Our results show that our 1D implementation outperforms state-of-the-art 2D and 3D implementations within CombBLAS for many configurations, inputs, and use cases, while remaining conceptually simpler.

A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

TL;DR

This work introduces a sparsity-aware 1D SpGEMM algorithm for distributed-memory systems that fetches only the A data blocks needed for local computation using MPI RDMA and a block-fetch strategy. By preserving the original sparsity structure and optionally applying graph partitioning, it reduces communication significantly compared to sparsity-oblivious 2D/3D approaches, and it achieves strong scalability on real-world sparse matrices. The method is implemented in CombBLAS with MPI+OpenMP and demonstrates substantial performance advantages in squaring, Galerkin-like restriction operations, and betweenness centrality workloads, particularly when partitioning is well-chosen. The paper also provides practical guidance on when to apply graph partitioning versus random permutation and discusses integration with existing solvers like PETSc and Trilinos, highlighting the approach as a high-performance primitive for SpGEMM-related applications.

Abstract

Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory parallel algorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that use 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically reduce communication by not fetching nonzeros of the sparse matrices that do not participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation. It uses MPI RDMA operations to mitigate the cost of packing/unpacking submatrices for communication, and it uses a block fetching strategy to avoid excessive fine-grained messaging. Our results show that our 1D implementation outperforms state-of-the-art 2D and 3D implementations within CombBLAS for many configurations, inputs, and use cases, while remaining conceptually simpler.
Paper Structure (25 sections, 1 equation, 14 figures, 3 tables, 3 algorithms)

This paper contains 25 sections, 1 equation, 14 figures, 3 tables, 3 algorithms.

Figures (14)

  • Figure 1: An example of sparsity-aware 1D SpGEMM algorithm (Algorithm \ref{['alg:spgemm1d-cbc']}) and block fetching strategy (Algorithm \ref{['alg:kchunkgroup']}).
  • Figure 2: nlpkkt200 visualization
  • Figure 3: hv15r visualization
  • Figure 4: Impact of permutation strategies on hv15r (upper two figures) and eukarya (lower three figures) datasets in squaring operation. Note that hv15r doesn't have METIS Permutation.
  • Figure 5: Communication volume comparison of different permutation strategies in squaring operation.
  • ...and 9 more figures