Fast multiplication of random dense matrices with fixed sparse matrices

Tianyu Liang; Riley Murray; Aydın Buluç; James Demmel

Fast multiplication of random dense matrices with fixed sparse matrices

Tianyu Liang, Riley Murray, Aydın Buluç, James Demmel

TL;DR

The paper tackles accelerating the dense-sparse product $\hat{A}=SA$ where $S$ is dense and random and $A$ is fixed sparse, by combining blocking/tiling with on-the-fly random number generation to reduce data movement and improve parallelism on shared-memory systems. It develops a memory-movement model and derives a lower bound and a practical computation-inefficiency bound, highlighting the gap between theoretical limits and real-world performance. The authors discuss RNG choices (including counter-based RNGs and Xoshiro), microkernel considerations, and blocking strategies, and they demonstrate practical performance gains and scalability on Intel hardware, along with verification against SuiteSparse. The work also outlines planned extensions such as a matrix-signature metric for analysis and benchmarking against established solvers in over-determined settings, showing competitiveness with SuiteSparse in certain regimes.

Abstract

This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation (on-the-fly random number generation) to accelerate this operation. The techniques we propose decrease memory movement, thereby increasing the algorithm's parallel scalability in shared memory architectures. On the Intel Frontera architecture, our algorithm can achieve 2x speedups over libraries such as Eigen and Intel MKL on some examples. In addition, with 32 threads, we can obtain a parallel efficiency of up to approximately 45%. We also present a theoretical analysis for the memory movement lower bound of our algorithm, showing that under mild assumptions, it's possible to beat the data movement lower bound of general matrix-matrix multiply (GEMM) by a factor of $\sqrt M$, where $M$ is the cache size. Finally, we incorporate our sketching algorithm into a randomized least squares solver. For extremely over-determined sparse input matrices, we show that our results are competitive with SuiteSparse; in some cases, we obtain a speedup of 10x over SuiteSparse.

Fast multiplication of random dense matrices with fixed sparse matrices

TL;DR

The paper tackles accelerating the dense-sparse product

where

is dense and random and

is fixed sparse, by combining blocking/tiling with on-the-fly random number generation to reduce data movement and improve parallelism on shared-memory systems. It develops a memory-movement model and derives a lower bound and a practical computation-inefficiency bound, highlighting the gap between theoretical limits and real-world performance. The authors discuss RNG choices (including counter-based RNGs and Xoshiro), microkernel considerations, and blocking strategies, and they demonstrate practical performance gains and scalability on Intel hardware, along with verification against SuiteSparse. The work also outlines planned extensions such as a matrix-signature metric for analysis and benchmarking against established solvers in over-determined settings, showing competitiveness with SuiteSparse in certain regimes.

Abstract

, where

is the cache size. Finally, we incorporate our sketching algorithm into a randomized least squares solver. For extremely over-determined sparse input matrices, we show that our results are competitive with SuiteSparse; in some cases, we obtain a speedup of 10x over SuiteSparse.

Paper Structure (16 sections, 14 equations, 1 table)

This paper contains 16 sections, 14 equations, 1 table.

Introduction
Blocking scheme
Blocking Beyond the First Level
Microkernel
RNG Choice
Parallel Matrix Multiply
Implementation and language
Theory (Intel 11700k cpu)
$m >> n$
A More Realistic Bound
Experiment results
Sequential
Parallel
Verification
Matrix Signature (planned work)
...and 1 more sections

Fast multiplication of random dense matrices with fixed sparse matrices

TL;DR

Abstract

Fast multiplication of random dense matrices with fixed sparse matrices

Authors

TL;DR

Abstract

Table of Contents