SABLE: Staging Blocked Evaluation of Sparse Matrix Computations

Pratyush Das; Amirhossein Basareh; Adhitha Dias; Artem Pelenitsyn; Kirshanthan Sundararajah; Milind Kulkarni

SABLE: Staging Blocked Evaluation of Sparse Matrix Computations

Pratyush Das, Amirhossein Basareh, Adhitha Dias, Artem Pelenitsyn, Kirshanthan Sundararajah, Milind Kulkarni

TL;DR

SABLE tackles SpMV performance for matrices with structured sparsity by moving beyond purely dense or sparse representations. It introduces a staging-based inspector-executor framework that partitions CSR matrices into a Variable Block Row (VBR) representation and a hybrid VBR-C format, then generates region-specific code for high $\\delta$-dense blocks while dispatching low $\\delta$-dense blocks to a baseline library. The work provides a novel partitioner and a static classifier to identify matrices that benefit from this approach, and a multi-stage staging compiler that produces specialized C code for these dense regions. On real-world SuiteSparse matrices, SABLE achieves geometric mean speedups of $1.07$, $2.73$, and $1.9$ over Intel MKL, CSR5, and Partially-Strided Codelets in single-threaded runs, with amplified gains under parallel execution, demonstrating practical impact for structured sparse workloads. Overall, SABLE shows that leveraging dense substructures within sparse matrices and compiling region-specific code can yield substantial performance improvements for SpMV while maintaining flexibility through a hybrid storage scheme.

Abstract

Structured sparsity, like regions of non-zero elements in sparse matrices, can offer optimization opportunities often overlooked by existing solutions that treat matrices as entirely dense or sparse. Block-based approaches, such as BCSR, partially address this issue by choosing between fixed-size blocks which results in wasted computation on zero elements. On the other hand, variable-sized blocks introduce overheads due to variable loop bounds unknown at compile time. We present SABLE, a novel staging framework that achieves the best of both approaches by generating region-specific code tailored for variable-sized blocks. SABLE partitions the matrix to identify profitable blocks and specializes generated code for vectorization. We evaluate SABLE on the SpMV kernel using the SuiteSparse collection. SABLE achieves a geomean of $1.07$, $2.73$ and $1.9$ speedup over the state of the art systems: Intel MKL, CSR5 and Partially-Strided Codelets, respectively, single threaded and even more when parallelized.

SABLE: Staging Blocked Evaluation of Sparse Matrix Computations

TL;DR

-dense blocks while dispatching low

-dense blocks to a baseline library. The work provides a novel partitioner and a static classifier to identify matrices that benefit from this approach, and a multi-stage staging compiler that produces specialized C code for these dense regions. On real-world SuiteSparse matrices, SABLE achieves geometric mean speedups of

, and

over Intel MKL, CSR5, and Partially-Strided Codelets in single-threaded runs, with amplified gains under parallel execution, demonstrating practical impact for structured sparse workloads. Overall, SABLE shows that leveraging dense substructures within sparse matrices and compiling region-specific code can yield substantial performance improvements for SpMV while maintaining flexibility through a hybrid storage scheme.

Abstract

and

speedup over the state of the art systems: Intel MKL, CSR5 and Partially-Strided Codelets, respectively, single threaded and even more when parallelized.

Paper Structure (20 sections, 3 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 3 equations, 13 figures, 2 tables, 1 algorithm.

Introduction
Background
Sparse tensor compilers
Staging
Sparse Matrix-Vector multiplication (SpMV)
Overview
Variable Block Row - Compressed
Partitioner and Classifier
Staging compiler
Evaluation
Experimental setup
Baselines
Classifier
SABLE execution time
Single-threaded performance
...and 5 more sections

Figures (13)

Figure 1: Matrix bcspwr06 from the SuiteSparse collection.
Figure 2: Matrix represented in the Variable Block Row format.
Figure 3: Overview of SABLE
Figure 4: Matrix represented in the Variable Block Row - Compressed format.
Figure 5: Blocking a mostly dense region of a sparse matrix.
...and 8 more figures

SABLE: Staging Blocked Evaluation of Sparse Matrix Computations

TL;DR

Abstract

SABLE: Staging Blocked Evaluation of Sparse Matrix Computations

Authors

TL;DR

Abstract

Table of Contents

Figures (13)