LRAMM -- Low precision approximates GEMM via RSVD
Hongyaoxing Gu
TL;DR
LRAMM targets fast, accurate approximate matrix multiplication by fusing mixed-precision quantized GEMMs with RSVD-based low-rank decompositions. The method decomposes A and B via RSVD to rank-$r$ approximations, then composes a low-rank product with three quantized GEMMs, while a formal error analysis bounds quantization, RSVD, and interaction effects. The work provides time-complexity analysis, guidance on parameter selection, and extensive empirical evaluation across scales and distributions, demonstrating speedups with controllable accuracy, especially when input matrices exhibit low-rank structure. This approach offers practical implications for accelerating large-scale ML and scientific computing workloads on mixed-precision hardware.
Abstract
Matrix multiplication computation acceleration has been a research hotspot across various domains. Due to the characteristics of some applications, approximate matrix multiplication can achieve significant performance improvements without losing much precision. In this paper, we propose LRAMM - a high-performance matrix multiplication approximation algorithm that combines mixed-precision quantized matrix multiplication with RSVD techniques, further enhancing efficiency within the error range of low-precision matrix multiplication by utilizing matrix low-rank decomposition technology.
