A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

Yuxi Hong; Aydin Buluc

A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

Yuxi Hong, Aydin Buluc

TL;DR

This work introduces a sparsity-aware 1D SpGEMM algorithm for distributed-memory systems that fetches only the A data blocks needed for local computation using MPI RDMA and a block-fetch strategy. By preserving the original sparsity structure and optionally applying graph partitioning, it reduces communication significantly compared to sparsity-oblivious 2D/3D approaches, and it achieves strong scalability on real-world sparse matrices. The method is implemented in CombBLAS with MPI+OpenMP and demonstrates substantial performance advantages in squaring, Galerkin-like restriction operations, and betweenness centrality workloads, particularly when partitioning is well-chosen. The paper also provides practical guidance on when to apply graph partitioning versus random permutation and discusses integration with existing solvers like PETSc and Trilinos, highlighting the approach as a high-performance primitive for SpGEMM-related applications.

Abstract

Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory parallel algorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that use 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically reduce communication by not fetching nonzeros of the sparse matrices that do not participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation. It uses MPI RDMA operations to mitigate the cost of packing/unpacking submatrices for communication, and it uses a block fetching strategy to avoid excessive fine-grained messaging. Our results show that our 1D implementation outperforms state-of-the-art 2D and 3D implementations within CombBLAS for many configurations, inputs, and use cases, while remaining conceptually simpler.

A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

TL;DR

Abstract

Paper Structure (25 sections, 1 equation, 14 figures, 3 tables, 3 algorithms)

This paper contains 25 sections, 1 equation, 14 figures, 3 tables, 3 algorithms.

Introduction
Background and Related Work
Distributed Parallel SpGEMM Algorithms
Random Permutation and Graph Partitioning
Random Permutation
Graph Partitioning
SpGEMM Applications
Squaring
Algebraic Multigrid Solvers
Betweenness Centrality
Sparsity-aware 1D SpGEMM Algorithm
Main Algorithm and Block Fetch Strategy
Graph Partitioning
Implementation Details
Experiment Results
...and 10 more sections

Figures (14)

Figure 1: An example of sparsity-aware 1D SpGEMM algorithm (Algorithm \ref{['alg:spgemm1d-cbc']}) and block fetching strategy (Algorithm \ref{['alg:kchunkgroup']}).
Figure 2: nlpkkt200 visualization
Figure 3: hv15r visualization
Figure 4: Impact of permutation strategies on hv15r (upper two figures) and eukarya (lower three figures) datasets in squaring operation. Note that hv15r doesn't have METIS Permutation.
Figure 5: Communication volume comparison of different permutation strategies in squaring operation.
...and 9 more figures

A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

TL;DR

Abstract

A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

Authors

TL;DR

Abstract

Table of Contents

Figures (14)