Fast Hermitian Diagonalization with Nearly Optimal Precision

Rikhav Shah

Fast Hermitian Diagonalization with Nearly Optimal Precision

Rikhav Shah

TL;DR

The paper tackles Hermitian diagonalization in finite-precision arithmetic, aiming to compute $U$ and $D$ with $\|A-UDU^*\| \le \varepsilon\|A\|$ while minimizing both runtime and per-operation precision. It develops a near matrix multiplication time spectral bisection framework built on a stable matrix sign via Newton-Schulz and a refined deflation step, achieving a precision bound of $\lg(1/\varepsilon)+O(\log n+\log\log(1/\varepsilon))$ bits. The Hermitian setting yields markedly tighter bit requirements than general cases, supported by a concrete lower bound and a thorough stability analysis. The result has practical impact for high-precision eigen decomposition in large Hermitian systems, enabling efficient, backward-stable diagonalization in finite arithmetic.

Abstract

Algorithms for numerical tasks in finite precision simultaneously seek to minimize the number of floating point operations performed, and also the number of bits of precision required by each floating point operation. This paper presents an algorithm for Hermitian diagonalization requiring only $\lg(1/\varepsilon)+O(\log(n)+\log\log(1/\varepsilon))$ bits of precision where $n$ is the size of the input matrix and $\varepsilon$ is the target error. Furthermore, it runs in near matrix multiplication time. In the general setting, the first complete analysis of the stability of a near matrix multiplication time algorithm for diagonalization is that of Banks et al. [BGVKS20]. They exhibit an algorithm for diagonalizing an arbitrary matrix up to $\varepsilon$ backward error using only $O(\log^4(n/\varepsilon)\log(n))$ bits of precision. This work focuses on the Hermitian setting, where we determine a dramatically improved bound on the number of bits needed. In particular, the result is close to providing a practical bound. The exact bit count depends on the specific implementation of matrix multiplication and QR decomposition one wishes to use, but if one uses suitable $O(n^3)$-time implementations, then for $\varepsilon=10^{-15},n=4000$, we show 92 bits of precision suffice (and 59 are necessary). By comparison, the same parameters in [BGVKS20] does not even show that 682,916,525,000 bits suffice.

Fast Hermitian Diagonalization with Nearly Optimal Precision

TL;DR

The paper tackles Hermitian diagonalization in finite-precision arithmetic, aiming to compute

and

with

while minimizing both runtime and per-operation precision. It develops a near matrix multiplication time spectral bisection framework built on a stable matrix sign via Newton-Schulz and a refined deflation step, achieving a precision bound of

bits. The Hermitian setting yields markedly tighter bit requirements than general cases, supported by a concrete lower bound and a thorough stability analysis. The result has practical impact for high-precision eigen decomposition in large Hermitian systems, enabling efficient, backward-stable diagonalization in finite arithmetic.

Abstract

bits of precision where

is the size of the input matrix and

is the target error. Furthermore, it runs in near matrix multiplication time. In the general setting, the first complete analysis of the stability of a near matrix multiplication time algorithm for diagonalization is that of Banks et al. [BGVKS20]. They exhibit an algorithm for diagonalizing an arbitrary matrix up to

backward error using only

bits of precision. This work focuses on the Hermitian setting, where we determine a dramatically improved bound on the number of bits needed. In particular, the result is close to providing a practical bound. The exact bit count depends on the specific implementation of matrix multiplication and QR decomposition one wishes to use, but if one uses suitable

-time implementations, then for

, we show 92 bits of precision suffice (and 59 are necessary). By comparison, the same parameters in [BGVKS20] does not even show that 682,916,525,000 bits suffice.

Paper Structure (8 sections, 12 theorems, 115 equations, 1 figure, 3 algorithms)

This paper contains 8 sections, 12 theorems, 115 equations, 1 figure, 3 algorithms.

Introduction
Contributions
Model of computation
Subroutines
Matrix sign function
Insufficiency of Newton iteration
Analysis of deflate
Spectral bisection

Key Result

Proposition 1.1

Any method that computes $U,D$ for a given symmetric $A$ satisfying $\left\|A-UDU^*\right\|\le\varepsilon\left\|A\right\|$ requires $\lg(1/\mathbf{u})\ge\lg(1/\varepsilon)+0.5\lg(n)-2$ bits of precision.

Figures (1)

Figure 1: Convergence plots for Newton iteration (left) and Newton-Schulz iteration (right). The color denotes the number of iterations $k$ until $\mleft|z_k^2-1\mright|<10^{-15}$. The lightest shade of yellow is one iteration and each ring denotes one additional iteration. Purple denotes "does not converge".

Theorems & Definitions (36)

Remark 1: Double-double precision
Proposition 1.1: Lower bound
proof
Definition 1.1: From b6
Definition 1.2: From b7
Definition 1.3
Definition 1.4
Remark 2: Polar decomposition
Remark 3: Faster iterative algorithms
Lemma 2.1: One step error bound
...and 26 more

Fast Hermitian Diagonalization with Nearly Optimal Precision

TL;DR

Abstract

Fast Hermitian Diagonalization with Nearly Optimal Precision

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (36)