Table of Contents
Fetching ...

Performance evaluation of accelerated real and complex multiple-precision sparse matrix-vector multiplication

Tomonori Kouya

TL;DR

This work tackles the efficiency of sparse matrix–vector multiplication when using multiple-precision arithmetic to improve numerical stability in Krylov subspace methods. It introduces SIMD-accelerated algorithms for real and complex SpMV across DD, TD, QD, and MPFR precisions, implemented via AVX2 and OpenMP within CSR-based sparse formats. The study demonstrates substantial speedups for larger matrices and highlights nuanced performance depending on matrix structure and the SpMV vs SpTMV workload, with notable gains in Krylov iterations for nd3k. The results underscore the practicality of high-precision, SIMD-enhanced SpMV for robust linear algebra in scientific computing and AI-related workloads, and point to future work in Lanczos methods and mixed-precision preconditioning.

Abstract

Sparse matrices have recently played a significant and impactful role in scientific computing, including artificial intelligence-related fields. According to historical studies on sparse matrix--vector multiplication (SpMV), Krylov subspace methods are particularly sensitive to the effects of round-off errors when using floating-point arithmetic. By employing multiple-precision linear computation, convergence can be stabilized by reducing these round-off errors. In this paper, we present the performance of our accelerated SpMV using SIMD instructions, demonstrating its effectiveness through various examples, including Krylov subspace methods.

Performance evaluation of accelerated real and complex multiple-precision sparse matrix-vector multiplication

TL;DR

This work tackles the efficiency of sparse matrix–vector multiplication when using multiple-precision arithmetic to improve numerical stability in Krylov subspace methods. It introduces SIMD-accelerated algorithms for real and complex SpMV across DD, TD, QD, and MPFR precisions, implemented via AVX2 and OpenMP within CSR-based sparse formats. The study demonstrates substantial speedups for larger matrices and highlights nuanced performance depending on matrix structure and the SpMV vs SpTMV workload, with notable gains in Krylov iterations for nd3k. The results underscore the practicality of high-precision, SIMD-enhanced SpMV for robust linear algebra in scientific computing and AI-related workloads, and point to future work in Lanczos methods and mixed-precision preconditioning.

Abstract

Sparse matrices have recently played a significant and impactful role in scientific computing, including artificial intelligence-related fields. According to historical studies on sparse matrix--vector multiplication (SpMV), Krylov subspace methods are particularly sensitive to the effects of round-off errors when using floating-point arithmetic. By employing multiple-precision linear computation, convergence can be stabilized by reducing these round-off errors. In this paper, we present the performance of our accelerated SpMV using SIMD instructions, demonstrating its effectiveness through various examples, including Krylov subspace methods.

Paper Structure

This paper contains 12 sections, 5 equations, 6 figures, 5 tables, 5 algorithms.

Figures (6)

  • Figure 1: Calculation of SIMDized SpMV
  • Figure 2: Structure of tub1000 (left) and nd3k (right)ufsparse
  • Figure 3: Speedup ratio of parallelized SpMV, mixed-precision SpMV (dSpMV) for tub1000
  • Figure 4: Number of iterations of Krylov subspace methods (upper) and speedup ratio (lower) for tub1000
  • Figure 5: Speedup ratio of parallelized SpMV, dSpMV for nd3k
  • ...and 1 more figures