Table of Contents
Fetching ...

Floating Point Compression of Hierarchical Matrix Formats and its Impact on Matrix-Vector Multiplication

Ronald Kriemann

TL;DR

This work tackles memory bandwidth as a primary bottleneck in matrix-vector multiplication for dense hierarchical matrices, by integrating floating point compression with $\mathcal{H}$-, uniform $\mathcal{H}$-, and $\mathcal{H}^2$-matrix formats. It develops on-the-fly decompression-based arithmetic and investigates AFLP and FPX compression alongside VALR for low-rank blocks, preserving prescribed accuracy $\varepsilon$. The authors demonstrate substantial speedups in compressed MVM (up to roughly $2$–$3\times$ for $\mathcal{H}$, $1.5$–$2.5\times$ for $\mathcal{U}\mathcal{H}$, with competitive results for $\mathcal{H}^2$) while reducing memory footprints, and provide a detailed parallel, memory-aware algorithmic framework. The approach is especially impactful for memory-bound hierarchical solvers and can generalize to related linear-algebra kernels, highlighting the practical viability of accuracy-controlled floating point compression in hierarchical matrix workflows.

Abstract

Matrix-vector multiplication forms the basis of many iterative solution algorithms and as such is an important algorithm also for hierarchical matrices which are used to represent dense data in an optimized form by applying low-rank compression. However, due to its low computational intensity, the performance of matrix-vector multiplication is typically limited by the available memory bandwidth on parallel systems. With floating point compression the memory footprint can be optimized, which reduces the stress on the memory sub system and thereby increases performance. We will look into the compression of different formats of hierachical matrices and how this can be used to speed up the corresponding matrix-vector multiplication.

Floating Point Compression of Hierarchical Matrix Formats and its Impact on Matrix-Vector Multiplication

TL;DR

This work tackles memory bandwidth as a primary bottleneck in matrix-vector multiplication for dense hierarchical matrices, by integrating floating point compression with -, uniform -, and -matrix formats. It develops on-the-fly decompression-based arithmetic and investigates AFLP and FPX compression alongside VALR for low-rank blocks, preserving prescribed accuracy . The authors demonstrate substantial speedups in compressed MVM (up to roughly for , for , with competitive results for ) while reducing memory footprints, and provide a detailed parallel, memory-aware algorithmic framework. The approach is especially impactful for memory-bound hierarchical solvers and can generalize to related linear-algebra kernels, highlighting the practical viability of accuracy-controlled floating point compression in hierarchical matrix workflows.

Abstract

Matrix-vector multiplication forms the basis of many iterative solution algorithms and as such is an important algorithm also for hierarchical matrices which are used to represent dense data in an optimized form by applying low-rank compression. However, due to its low computational intensity, the performance of matrix-vector multiplication is typically limited by the available memory bandwidth on parallel systems. With floating point compression the memory footprint can be optimized, which reduces the stress on the memory sub system and thereby increases performance. We will look into the compression of different formats of hierachical matrices and how this can be used to speed up the corresponding matrix-vector multiplication.
Paper Structure (15 sections, 17 equations, 15 figures, 1 table, 8 algorithms)

This paper contains 15 sections, 17 equations, 15 figures, 1 table, 8 algorithms.

Figures (15)

  • Figure 1: Matrix storage for different $\mathcal{H}\xspace$ formats depending on matrix size (left) and accuracy (right).
  • Figure 2: Separate (left), shared (center) and nested (right) cluster bases for $\mathcal{H}\xspace$-matrices, $\mathcal{U}\xspace\mathcal{H}\xspace$-matrices and $\mathcal{H}\xspace^2$-matrices.
  • Figure 3: Stacking of low-rank factors for BLR clustering.
  • Figure 4: Stacking of low-rank factors per level of the $\mathcal{H}\xspace$-matrix.
  • Figure 5: Sorting of $\mathcal{H}\xspace$-matrix blocks per block row and hierarchy level.
  • ...and 10 more figures

Theorems & Definitions (10)

  • Definition 2.1: Cluster Tree
  • Definition 2.2: Block Tree
  • Definition 2.3: $\mathcal{H}\xspace$-Matrix
  • Remark 2.4
  • Definition 2.5
  • Remark 3.1
  • Remark 3.2
  • Remark 3.3
  • Remark 3.4
  • Remark 4.1