Table of Contents
Fetching ...

A mixed precision Jacobi SVD algorithm

Weiguo Gao, Yuxin Ma, Meiyue Shao

TL;DR

The paper introduces a mixed precision Jacobi SVD algorithm that accelerates dense SVD by performing the preconditioning and initial SVD in a lower precision and then refining in working precision with the one-sided Jacobi SVD. The approach hinges on a QR-based preconditioning, careful switching back to working precision, and robust refinement, with selective handling of challenging cases to preserve speed and accuracy. Theoretical backward-stability and practical quadratic convergence are established, and extensive numerical experiments demonstrate about a twofold speedup on x86-64 without sacrificing accuracy. This method offers a practical route to fast, high-accuracy SVD computations and points to potential extensions to GPUs and Hermitian eigenproblems.

Abstract

We propose a mixed precision Jacobi algorithm for computing the singular value decomposition (SVD) of a dense matrix. After appropriate preconditioning, the proposed algorithm computes the SVD in a lower precision as an initial guess, and then performs one-sided Jacobi rotations in the working precision as iterative refinement. By carefully transforming a lower precision solution to a higher precision one, our algorithm achieves about 2 times speedup on the x86-64 architecture compared to the usual one-sided Jacobi SVD algorithm in LAPACK, without sacrificing the accuracy.

A mixed precision Jacobi SVD algorithm

TL;DR

The paper introduces a mixed precision Jacobi SVD algorithm that accelerates dense SVD by performing the preconditioning and initial SVD in a lower precision and then refining in working precision with the one-sided Jacobi SVD. The approach hinges on a QR-based preconditioning, careful switching back to working precision, and robust refinement, with selective handling of challenging cases to preserve speed and accuracy. Theoretical backward-stability and practical quadratic convergence are established, and extensive numerical experiments demonstrate about a twofold speedup on x86-64 without sacrificing accuracy. This method offers a practical route to fast, high-accuracy SVD computations and points to potential extensions to GPUs and Hermitian eigenproblems.

Abstract

We propose a mixed precision Jacobi algorithm for computing the singular value decomposition (SVD) of a dense matrix. After appropriate preconditioning, the proposed algorithm computes the SVD in a lower precision as an initial guess, and then performs one-sided Jacobi rotations in the working precision as iterative refinement. By carefully transforming a lower precision solution to a higher precision one, our algorithm achieves about 2 times speedup on the x86-64 architecture compared to the usual one-sided Jacobi SVD algorithm in LAPACK, without sacrificing the accuracy.
Paper Structure (23 sections, 7 theorems, 74 equations, 17 figures, 2 tables, 6 algorithms)

This paper contains 23 sections, 7 theorems, 74 equations, 17 figures, 2 tables, 6 algorithms.

Key Result

Proposition 1

Let $\hat{U}_X$ and $\hat{\Sigma}$, respectively, consist of the computed left singular vectors and singular values of the matrix $\mathop{\mathrm{f{}l}}\nolimits(X\hat{Q})$. Then there exists a unitary matrix $\Tilde{V}_{X}$ and a backward perturbation $F$ such that where for all $i$. In addition, if $\hat{V}_X$ consists of the computed right singular vectors, then $\hat{U}_X\hat{\Sigma}\hat{V}

Figures (17)

  • Figure 1: Relative run time of Algorithm \ref{['alg:basic-msvj']} for $4096\times4096$ real matrices with $\bigl(\kappa(D),\kappa(B)\bigr)=(10^2,10^{12})$. The numbers in the bars denote the numbers of iterations required by the Jacobi algorithm.
  • Figure 2: Relative run time of Algorithm \ref{['alg:mprrqr']} for $4096\times4096$ real matrices with $\bigl(\kappa(D),\kappa(B)\bigr)=(10^2,10^{12})$.
  • Figure 3: Relative run time of Algorithm \ref{['alg:mprrqr']} for $4096\times4096$ real matrices with $\bigl(\kappa(D),\kappa(B)\bigr)=(10^{20},10^{2})$.
  • Figure 4: Relative run time of Algorithm \ref{['alg:msvj']} for $4096\times4096$ real matrices with $\bigl(\kappa(D),\kappa(B)\bigr)=(10^2,10^{12})$. The numbers in the bars denote the numbers of iterations required by the Jacobi algorithm.
  • Figure 5: Relative run time of Algorithm \ref{['alg:msvj']} for $4096\times4096$ real matrices with $\bigl(\kappa(D),\kappa(B)\bigr)=(10^{20},10^2)$. The numbers in the bars denote the numbers of iterations required by the Jacobi algorithm.
  • ...and 12 more figures

Theorems & Definitions (14)

  • Proposition 1
  • proof
  • Theorem 1
  • Remark 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • proof : Proof of Theorem \ref{['thm:pre']}
  • Theorem 2
  • ...and 4 more