Floating Point Compression of Hierarchical Matrix Formats and its Impact on Matrix-Vector Multiplication
Ronald Kriemann
TL;DR
This work tackles memory bandwidth as a primary bottleneck in matrix-vector multiplication for dense hierarchical matrices, by integrating floating point compression with $\mathcal{H}$-, uniform $\mathcal{H}$-, and $\mathcal{H}^2$-matrix formats. It develops on-the-fly decompression-based arithmetic and investigates AFLP and FPX compression alongside VALR for low-rank blocks, preserving prescribed accuracy $\varepsilon$. The authors demonstrate substantial speedups in compressed MVM (up to roughly $2$–$3\times$ for $\mathcal{H}$, $1.5$–$2.5\times$ for $\mathcal{U}\mathcal{H}$, with competitive results for $\mathcal{H}^2$) while reducing memory footprints, and provide a detailed parallel, memory-aware algorithmic framework. The approach is especially impactful for memory-bound hierarchical solvers and can generalize to related linear-algebra kernels, highlighting the practical viability of accuracy-controlled floating point compression in hierarchical matrix workflows.
Abstract
Matrix-vector multiplication forms the basis of many iterative solution algorithms and as such is an important algorithm also for hierarchical matrices which are used to represent dense data in an optimized form by applying low-rank compression. However, due to its low computational intensity, the performance of matrix-vector multiplication is typically limited by the available memory bandwidth on parallel systems. With floating point compression the memory footprint can be optimized, which reduces the stress on the memory sub system and thereby increases performance. We will look into the compression of different formats of hierachical matrices and how this can be used to speed up the corresponding matrix-vector multiplication.
