Reorthogonalized Pythagorean variants of block classical Gram-Schmidt
Erin Carson, Kathryn Lund, Yuxin Ma, Eda Oktay
TL;DR
The paper develops two reorthogonalized block classical Gram–Schmidt variants based on a Pythagorean inner product to address loss of orthogonality in finite precision. It provides rigorous stability bounds showing $O(\varepsilon)$ loss of orthogonality under mild conditioning $(O(\varepsilon)\kappa^2(\bm{\mathcal X}) \le 1/2)$ and analyzes mixed-precision implementations, including two-precision variants. The authors prove LOO and residual bounds for both PIP+ and PIPI+ approaches, extending the analysis to general and dual-precision settings and validating results through the BlockStab toolbox. Numerical experiments across several matrix classes assess stability under different synchronization patterns and IO strategies, highlighting the practical viability and limitations of the proposed methods in scalable HPC contexts. Overall, the work advances stable, low-synchronization orthogonalization suitable for iterative solvers in high-performance environments, while identifying conditions under which the proposed variants remain robust and where edge-case ill-conditioning may require additional techniques such as restarting or preconditioning.
Abstract
Block classical Gram-Schmidt (BCGS) is commonly used for orthogonalizing a set of vectors $X$ in distributed computing environments due to its favorable communication properties relative to other orthogonalization approaches, such as modified Gram-Schmidt or Householder. However, it is known that BCGS (as well as recently developed low-synchronization variants of BCGS) can suffer from a significant loss of orthogonality in finite-precision arithmetic, which can contribute to instability and inaccurate solutions in downstream applications such as $s$-step Krylov subspace methods. A common solution to improve the orthogonality among the vectors is reorthogonalization. Focusing on the "Pythagorean" variant of BCGS, introduced in [E. Carson, K. Lund, & M. Rozložník. SIAM J. Matrix Anal. Appl. 42(3), pp. 1365--1380, 2021], which guarantees an $O(\varepsilon)κ^2(X)$ bound on the loss of orthogonality as long as $O(\varepsilon)κ^2(X)<1$, where $\varepsilon$ denotes the unit roundoff, we introduce and analyze two reorthogonalized Pythagorean BCGS variants. These variants feature favorable communication properties, with asymptotically two synchronization points per block column, as well as an improved $O(\varepsilon)$ bound on the loss of orthogonality. Our bounds are derived in a general fashion to additionally allow for the analysis of mixed-precision variants. We verify our theoretical results with a panel of test matrices and experiments from a new version of the \texttt{BlockStab} toolbox.
