Table of Contents
Fetching ...

PARS3: Parallel Sparse Skew-Symmetric Matrix-Vector Multiplication with Reverse Cuthill-McKee Reordering

Selin Yildirim, Murat Manguoglu

TL;DR

This work addresses the bottleneck of sparse skew-symmetric SpMV in iterative solvers by preprocessing the coefficient matrix with Reverse Cuthill-McKee to obtain a band form, then executing a parallel 3-Way Banded Skew-SSpMV using MPI. The method splits the band into inner, middle, and outer regions to exploit locality and reduce synchronization, achieving strong scalability with up to 19x speedup on selected matrices. Key contributions include the first parallel skew-symmetric SpMV kernel, RCM-based banding, and a three-region data layout that balances computation and communication. The approach is applicable to parallel symmetric SpMV as well and offers a practical pathway to accelerate iterative solvers in various scientific computing applications.

Abstract

Sparse matrices, as prevalent primitive of various scientific computing algorithms, persist as a bottleneck in processing. A skew-symmetric matrix flips signs of symmetric pairs in a symmetric matrix. Our work, Parallel 3-Way Banded Skew-Symmetric Sparse Matrix-Vector Multiplication, equally improves parallel symmetric SpMV kernels with a different perspective than the common literature trends, by manipulating the form of matrix in a preprocessing step to accelerate the repeated computations of iterative solvers. We effectively use Reverse Cuthill-McKee (RCM) reordering algorithm to transform a sparse skew-symmetrix matrix into a band matrix, then efficiently parallelize it by splitting the band structure into 3 different parts by considering its local sparsity. Our proposed method with RCM is novel in the sense that it is the first implementation of parallel skew-symmetric SpMV kernels. Our enhancements in SpMV and findings are valuable with significant strong scalings of up to 19x over the serial compressed SpMV implementation. We overperform a heuristic-based graph-coloring approach with synchronization phases in implementing parallel symmetric SpMVs. Our approach also naturally applies to parallel sparse symmetric SpMVs, that can inspire widespread SpMV solutions to adapt presented optimizations in this paper.

PARS3: Parallel Sparse Skew-Symmetric Matrix-Vector Multiplication with Reverse Cuthill-McKee Reordering

TL;DR

This work addresses the bottleneck of sparse skew-symmetric SpMV in iterative solvers by preprocessing the coefficient matrix with Reverse Cuthill-McKee to obtain a band form, then executing a parallel 3-Way Banded Skew-SSpMV using MPI. The method splits the band into inner, middle, and outer regions to exploit locality and reduce synchronization, achieving strong scalability with up to 19x speedup on selected matrices. Key contributions include the first parallel skew-symmetric SpMV kernel, RCM-based banding, and a three-region data layout that balances computation and communication. The approach is applicable to parallel symmetric SpMV as well and offers a practical pathway to accelerate iterative solvers in various scientific computing applications.

Abstract

Sparse matrices, as prevalent primitive of various scientific computing algorithms, persist as a bottleneck in processing. A skew-symmetric matrix flips signs of symmetric pairs in a symmetric matrix. Our work, Parallel 3-Way Banded Skew-Symmetric Sparse Matrix-Vector Multiplication, equally improves parallel symmetric SpMV kernels with a different perspective than the common literature trends, by manipulating the form of matrix in a preprocessing step to accelerate the repeated computations of iterative solvers. We effectively use Reverse Cuthill-McKee (RCM) reordering algorithm to transform a sparse skew-symmetrix matrix into a band matrix, then efficiently parallelize it by splitting the band structure into 3 different parts by considering its local sparsity. Our proposed method with RCM is novel in the sense that it is the first implementation of parallel skew-symmetric SpMV kernels. Our enhancements in SpMV and findings are valuable with significant strong scalings of up to 19x over the serial compressed SpMV implementation. We overperform a heuristic-based graph-coloring approach with synchronization phases in implementing parallel symmetric SpMVs. Our approach also naturally applies to parallel sparse symmetric SpMVs, that can inspire widespread SpMV solutions to adapt presented optimizations in this paper.
Paper Structure (9 sections, 6 equations, 9 figures, 1 table)

This paper contains 9 sections, 6 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Demonstration of RCM algorithm
  • Figure 2: Matrix: Audikw$\_$1SPARSKIT . Assuming 4 parallel processes with block distribution, this is an illustration of our data decomposition. Conflicting regions are reflected with purple colored squares (R2). Yellow squares denote safe regions (R1) for the concurrency. Multiplying elements of R2 with the corresponding elements in input vector races on output vector with those of R2 pair that is located in the transpose region, as data are being written onto the same output location by different processes. Determining an element as conflicting is found by checking its column offset in the current process, where each process owns a row block (at upper or lower triangular region) .
  • Figure 3: Serial SSpMV with SSS
  • Figure 4: RCM-transformed Matrix: boneS10SPARSKIT
  • Figure 5: Effectiveness of RCM depends on the initial matrix structure in regards to reducing its bandwith and reorganizing the data in a more compact way. The less nonzeros a matrix has, the more trivial it is to restructure it to have reduced bandwith.
  • ...and 4 more figures