Table of Contents
Fetching ...

Fast Hermitian Diagonalization with Nearly Optimal Precision

Rikhav Shah

TL;DR

The paper tackles Hermitian diagonalization in finite-precision arithmetic, aiming to compute $U$ and $D$ with $\|A-UDU^*\| \le \varepsilon\|A\|$ while minimizing both runtime and per-operation precision. It develops a near matrix multiplication time spectral bisection framework built on a stable matrix sign via Newton-Schulz and a refined deflation step, achieving a precision bound of $\lg(1/\varepsilon)+O(\log n+\log\log(1/\varepsilon))$ bits. The Hermitian setting yields markedly tighter bit requirements than general cases, supported by a concrete lower bound and a thorough stability analysis. The result has practical impact for high-precision eigen decomposition in large Hermitian systems, enabling efficient, backward-stable diagonalization in finite arithmetic.

Abstract

Algorithms for numerical tasks in finite precision simultaneously seek to minimize the number of floating point operations performed, and also the number of bits of precision required by each floating point operation. This paper presents an algorithm for Hermitian diagonalization requiring only $\lg(1/\varepsilon)+O(\log(n)+\log\log(1/\varepsilon))$ bits of precision where $n$ is the size of the input matrix and $\varepsilon$ is the target error. Furthermore, it runs in near matrix multiplication time. In the general setting, the first complete analysis of the stability of a near matrix multiplication time algorithm for diagonalization is that of Banks et al. [BGVKS20]. They exhibit an algorithm for diagonalizing an arbitrary matrix up to $\varepsilon$ backward error using only $O(\log^4(n/\varepsilon)\log(n))$ bits of precision. This work focuses on the Hermitian setting, where we determine a dramatically improved bound on the number of bits needed. In particular, the result is close to providing a practical bound. The exact bit count depends on the specific implementation of matrix multiplication and QR decomposition one wishes to use, but if one uses suitable $O(n^3)$-time implementations, then for $\varepsilon=10^{-15},n=4000$, we show 92 bits of precision suffice (and 59 are necessary). By comparison, the same parameters in [BGVKS20] does not even show that 682,916,525,000 bits suffice.

Fast Hermitian Diagonalization with Nearly Optimal Precision

TL;DR

The paper tackles Hermitian diagonalization in finite-precision arithmetic, aiming to compute and with while minimizing both runtime and per-operation precision. It develops a near matrix multiplication time spectral bisection framework built on a stable matrix sign via Newton-Schulz and a refined deflation step, achieving a precision bound of bits. The Hermitian setting yields markedly tighter bit requirements than general cases, supported by a concrete lower bound and a thorough stability analysis. The result has practical impact for high-precision eigen decomposition in large Hermitian systems, enabling efficient, backward-stable diagonalization in finite arithmetic.

Abstract

Algorithms for numerical tasks in finite precision simultaneously seek to minimize the number of floating point operations performed, and also the number of bits of precision required by each floating point operation. This paper presents an algorithm for Hermitian diagonalization requiring only bits of precision where is the size of the input matrix and is the target error. Furthermore, it runs in near matrix multiplication time. In the general setting, the first complete analysis of the stability of a near matrix multiplication time algorithm for diagonalization is that of Banks et al. [BGVKS20]. They exhibit an algorithm for diagonalizing an arbitrary matrix up to backward error using only bits of precision. This work focuses on the Hermitian setting, where we determine a dramatically improved bound on the number of bits needed. In particular, the result is close to providing a practical bound. The exact bit count depends on the specific implementation of matrix multiplication and QR decomposition one wishes to use, but if one uses suitable -time implementations, then for , we show 92 bits of precision suffice (and 59 are necessary). By comparison, the same parameters in [BGVKS20] does not even show that 682,916,525,000 bits suffice.
Paper Structure (8 sections, 12 theorems, 115 equations, 1 figure, 3 algorithms)

This paper contains 8 sections, 12 theorems, 115 equations, 1 figure, 3 algorithms.

Key Result

Proposition 1.1

Any method that computes $U,D$ for a given symmetric $A$ satisfying $\left\|A-UDU^*\right\|\le\varepsilon\left\|A\right\|$ requires $\lg(1/\mathbf{u})\ge\lg(1/\varepsilon)+0.5\lg(n)-2$ bits of precision.

Figures (1)

  • Figure 1: Convergence plots for Newton iteration (left) and Newton-Schulz iteration (right). The color denotes the number of iterations $k$ until $\mleft|z_k^2-1\mright|<10^{-15}$. The lightest shade of yellow is one iteration and each ring denotes one additional iteration. Purple denotes "does not converge".

Theorems & Definitions (36)

  • Remark 1: Double-double precision
  • Proposition 1.1: Lower bound
  • proof
  • Definition 1.1: From b6
  • Definition 1.2: From b7
  • Definition 1.3
  • Definition 1.4
  • Remark 2: Polar decomposition
  • Remark 3: Faster iterative algorithms
  • Lemma 2.1: One step error bound
  • ...and 26 more