High Performance Unstructured SpMM Computation Using Tensor Cores
Patrik Okanovic, Grzegorz Kwasniewski, Paolo Sylos Labini, Maciej Besta, Flavio Vella, Torsten Hoefler
TL;DR
SMaT addresses the challenge of accelerating SpMM on unstructured sparsity by converting CSR inputs to a block-based CSR format and executing a Tensor Core–aware, 2D bottom-up kernel. It combines a row-wise permutation preprocessing to densify blocks with a highly optimized CUDA implementation that uses the MMA API and asynchronous data transfers. The approach yields up to 125x speedups over cuSPARSE and up to 2,445x over cuSPARSE on synthetic matrices, with strong improvements across real-world SuiteSparse matrices as well, particularly at higher sparsity and larger dense matrix widths. The results demonstrate that hardware-aware blocking and careful data movement can unlock Tensor Core performance for general SpMM, broadening the applicability to scientific computing, large-model training, and inference tasks.
Abstract
High-performance sparse matrix-matrix (SpMM) multiplication is paramount for science and industry, as the ever-increasing sizes of data prohibit using dense data structures. Yet, existing hardware, such as Tensor Cores (TC), is ill-suited for SpMM, as it imposes strict constraints on data structures that cannot be met by unstructured sparsity found in many applications. To address this, we introduce (S)parse (Ma)trix Matrix (T)ensor Core-accelerated (SMaT): a novel SpMM library that utilizes TCs for unstructured sparse matrices. Our block-sparse library leverages the low-level CUDA MMA (matrix-matrix-accumulate) API, maximizing the performance offered by modern GPUs. Algorithmic optimizations such as sparse matrix permutation further improve performance by minimizing the number of non-zero blocks. The evaluation on NVIDIA A100 shows that SMaT outperforms SotA libraries (DASP, cuSPARSE, and Magicube) by up to 125x (on average 2.6x). SMaT can be used to accelerate many workloads in scientific computing, large-model training, inference, and others.
