Table of Contents
Fetching ...

SABLE: Staging Blocked Evaluation of Sparse Matrix Computations

Pratyush Das, Amirhossein Basareh, Adhitha Dias, Artem Pelenitsyn, Kirshanthan Sundararajah, Milind Kulkarni

TL;DR

SABLE tackles SpMV performance for matrices with structured sparsity by moving beyond purely dense or sparse representations. It introduces a staging-based inspector-executor framework that partitions CSR matrices into a Variable Block Row (VBR) representation and a hybrid VBR-C format, then generates region-specific code for high $\\delta$-dense blocks while dispatching low $\\delta$-dense blocks to a baseline library. The work provides a novel partitioner and a static classifier to identify matrices that benefit from this approach, and a multi-stage staging compiler that produces specialized C code for these dense regions. On real-world SuiteSparse matrices, SABLE achieves geometric mean speedups of $1.07$, $2.73$, and $1.9$ over Intel MKL, CSR5, and Partially-Strided Codelets in single-threaded runs, with amplified gains under parallel execution, demonstrating practical impact for structured sparse workloads. Overall, SABLE shows that leveraging dense substructures within sparse matrices and compiling region-specific code can yield substantial performance improvements for SpMV while maintaining flexibility through a hybrid storage scheme.

Abstract

Structured sparsity, like regions of non-zero elements in sparse matrices, can offer optimization opportunities often overlooked by existing solutions that treat matrices as entirely dense or sparse. Block-based approaches, such as BCSR, partially address this issue by choosing between fixed-size blocks which results in wasted computation on zero elements. On the other hand, variable-sized blocks introduce overheads due to variable loop bounds unknown at compile time. We present SABLE, a novel staging framework that achieves the best of both approaches by generating region-specific code tailored for variable-sized blocks. SABLE partitions the matrix to identify profitable blocks and specializes generated code for vectorization. We evaluate SABLE on the SpMV kernel using the SuiteSparse collection. SABLE achieves a geomean of $1.07$, $2.73$ and $1.9$ speedup over the state of the art systems: Intel MKL, CSR5 and Partially-Strided Codelets, respectively, single threaded and even more when parallelized.

SABLE: Staging Blocked Evaluation of Sparse Matrix Computations

TL;DR

SABLE tackles SpMV performance for matrices with structured sparsity by moving beyond purely dense or sparse representations. It introduces a staging-based inspector-executor framework that partitions CSR matrices into a Variable Block Row (VBR) representation and a hybrid VBR-C format, then generates region-specific code for high -dense blocks while dispatching low -dense blocks to a baseline library. The work provides a novel partitioner and a static classifier to identify matrices that benefit from this approach, and a multi-stage staging compiler that produces specialized C code for these dense regions. On real-world SuiteSparse matrices, SABLE achieves geometric mean speedups of , , and over Intel MKL, CSR5, and Partially-Strided Codelets in single-threaded runs, with amplified gains under parallel execution, demonstrating practical impact for structured sparse workloads. Overall, SABLE shows that leveraging dense substructures within sparse matrices and compiling region-specific code can yield substantial performance improvements for SpMV while maintaining flexibility through a hybrid storage scheme.

Abstract

Structured sparsity, like regions of non-zero elements in sparse matrices, can offer optimization opportunities often overlooked by existing solutions that treat matrices as entirely dense or sparse. Block-based approaches, such as BCSR, partially address this issue by choosing between fixed-size blocks which results in wasted computation on zero elements. On the other hand, variable-sized blocks introduce overheads due to variable loop bounds unknown at compile time. We present SABLE, a novel staging framework that achieves the best of both approaches by generating region-specific code tailored for variable-sized blocks. SABLE partitions the matrix to identify profitable blocks and specializes generated code for vectorization. We evaluate SABLE on the SpMV kernel using the SuiteSparse collection. SABLE achieves a geomean of , and speedup over the state of the art systems: Intel MKL, CSR5 and Partially-Strided Codelets, respectively, single threaded and even more when parallelized.
Paper Structure (20 sections, 3 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 3 equations, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: Matrix bcspwr06 from the SuiteSparse collection.
  • Figure 2: Matrix represented in the Variable Block Row format.
  • Figure 3: Overview of SABLE
  • Figure 4: Matrix represented in the Variable Block Row - Compressed format.
  • Figure 5: Blocking a mostly dense region of a sparse matrix.
  • ...and 8 more figures