PARS3: Parallel Sparse Skew-Symmetric Matrix-Vector Multiplication with Reverse Cuthill-McKee Reordering
Selin Yildirim, Murat Manguoglu
TL;DR
This work addresses the bottleneck of sparse skew-symmetric SpMV in iterative solvers by preprocessing the coefficient matrix with Reverse Cuthill-McKee to obtain a band form, then executing a parallel 3-Way Banded Skew-SSpMV using MPI. The method splits the band into inner, middle, and outer regions to exploit locality and reduce synchronization, achieving strong scalability with up to 19x speedup on selected matrices. Key contributions include the first parallel skew-symmetric SpMV kernel, RCM-based banding, and a three-region data layout that balances computation and communication. The approach is applicable to parallel symmetric SpMV as well and offers a practical pathway to accelerate iterative solvers in various scientific computing applications.
Abstract
Sparse matrices, as prevalent primitive of various scientific computing algorithms, persist as a bottleneck in processing. A skew-symmetric matrix flips signs of symmetric pairs in a symmetric matrix. Our work, Parallel 3-Way Banded Skew-Symmetric Sparse Matrix-Vector Multiplication, equally improves parallel symmetric SpMV kernels with a different perspective than the common literature trends, by manipulating the form of matrix in a preprocessing step to accelerate the repeated computations of iterative solvers. We effectively use Reverse Cuthill-McKee (RCM) reordering algorithm to transform a sparse skew-symmetrix matrix into a band matrix, then efficiently parallelize it by splitting the band structure into 3 different parts by considering its local sparsity. Our proposed method with RCM is novel in the sense that it is the first implementation of parallel skew-symmetric SpMV kernels. Our enhancements in SpMV and findings are valuable with significant strong scalings of up to 19x over the serial compressed SpMV implementation. We overperform a heuristic-based graph-coloring approach with synchronization phases in implementing parallel symmetric SpMVs. Our approach also naturally applies to parallel sparse symmetric SpMVs, that can inspire widespread SpMV solutions to adapt presented optimizations in this paper.
