Table of Contents
Fetching ...

Random-sketching Techniques to Enhance the Numerical Stability of Block Orthogonalization Algorithms for s-step GMRES

Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld

TL;DR

The paper tackles numerical stability in block orthogonalization for $s$-step GMRES by integrating random sketching into the intra-block orthogonalization, ensuring $O(\epsilon)$ orthogonality when block vectors are numerically full-rank. It introduces RandCholQR and two-stage frameworks (with preprocessing via BCGS-PIP or RandBCGS2) to maintain well-conditioned bases while reducing communication. The authors provide theoretical bounds based on $\mu$-subspace embeddings, present implementation within Trilinos, and demonstrate through GPU-accelerated experiments on Perlmutter that the stabilization comes with modest overhead and scalable performance. Overall, the work delivers a portable, numerically robust approach to CA-Krylov methods with practical impact for large-scale GMRES in high-performance computing.

Abstract

We integrate random sketching techniques into block orthogonalization schemes needed for s-step GMRES. The resulting block orthogonalization schemes generate the basis vectors whose overall orthogonality error is bounded by machine precision as long as each of the corresponding block vectors are numerically full rank. We implement these randomized block orthogonalization schemes using standard distributed-memory linear algebra kernels for s-step GMRES available in the Trilinos software packages. Our performance results on the Perlmutter supercomputer (with four NVIDIA A100 GPUs per node) demonstrate that these randomized techniques can enhance the numerical stability of the orthogonalization and overall solver, without a significant increase in the execution time.

Random-sketching Techniques to Enhance the Numerical Stability of Block Orthogonalization Algorithms for s-step GMRES

TL;DR

The paper tackles numerical stability in block orthogonalization for -step GMRES by integrating random sketching into the intra-block orthogonalization, ensuring orthogonality when block vectors are numerically full-rank. It introduces RandCholQR and two-stage frameworks (with preprocessing via BCGS-PIP or RandBCGS2) to maintain well-conditioned bases while reducing communication. The authors provide theoretical bounds based on -subspace embeddings, present implementation within Trilinos, and demonstrate through GPU-accelerated experiments on Perlmutter that the stabilization comes with modest overhead and scalable performance. Overall, the work delivers a portable, numerically robust approach to CA-Krylov methods with practical impact for large-scale GMRES in high-performance computing.

Abstract

We integrate random sketching techniques into block orthogonalization schemes needed for s-step GMRES. The resulting block orthogonalization schemes generate the basis vectors whose overall orthogonality error is bounded by machine precision as long as each of the corresponding block vectors are numerically full rank. We implement these randomized block orthogonalization schemes using standard distributed-memory linear algebra kernels for s-step GMRES available in the Trilinos software packages. Our performance results on the Perlmutter supercomputer (with four NVIDIA A100 GPUs per node) demonstrate that these randomized techniques can enhance the numerical stability of the orthogonalization and overall solver, without a significant increase in the execution time.

Paper Structure

This paper contains 20 sections, 8 theorems, 34 equations, 15 figures, 4 tables.

Key Result

Corollary 3.1.1

If the sketch matrix $\Theta \in \mathbb{R}^{n\times\widehat{m}}$ is a $\mu$-subspace embedding for the subspace $\mathcal{V} \subset \mathbb{R}^n$, then $\forall x \in \mathcal{V}$,

Figures (15)

  • Figure 1: Pseudocode of $s$-step GMRES where $[Q_j,R_j] = \hbox{qr}(Q,V_j)$ extends the QR factorization such that $Q R = V$ with $Q^TQ = I$ and upper-triangular $R$ with non-negative diagonals
  • Figure 2: Block Classical Gram-Schmidt (BCGS) to orthogonalize $V_j$ against the orthonormal vectors $Q_{1:j-1}$, where "chol($G$)" returns the upper-triangular Cholesky factor of $G$.
  • Figure 3: Recursive Cholesky QR (CholQR) to orthonormalize a set of vectors $V \in \mathbb{R}^{n\times s}$, where "$\hbox{chol}(G)$ returns the upper-triangular Cholesky factor of the Gram matrix $G$.
  • Figure 4: Randomized QR algorithm, where $\hbox{HH}(V)$ returns the orthogonal basis vectors $Q$ and the upper-triangular matrix $R$ based on the Householder QR algorithm such that $V=QR$.
  • Figure 5: Two-stage Block Orthgonalization with MPK.
  • ...and 10 more figures

Theorems & Definitions (16)

  • Definition 3.1: $\mu$-subspace embedding
  • Corollary 3.1.1
  • Corollary 3.1.1
  • Corollary 3.1.2
  • Definition 3.2: $(\mu, \delta, \widehat{s})$ oblivious $\ell_2$-subspace embedding
  • Proposition 5.1
  • proof
  • Proposition 5.2
  • proof
  • Proposition 7.1
  • ...and 6 more