Table of Contents
Fetching ...

Low-synchronization Arnoldi Methods for the Matrix Exponential with Application to Exponential Integrators

Tanya Tafolla, Stéphane Gaudreault, Mayya Tokman

Abstract

High order exponential integrators require computing linear combination of exponential like $\varphi$-functions of large matrices $A$ times a vector $v$. Krylov projection methods are the most general and remain an efficient choice for computing the matrix-function-vector-product evaluation when the matrix is $A$ is large and unable to be explicitly stored, or when obtaining information about the spectrum is expensive. The Krylov approximation relies on the Gram-Schmidt (GS) orthogonalization procedure to produce the orthonormal basis $V_m$. In parallel, GS orthogonalization requires \textit{global synchronizations} for inner products and vector normalization in the orthogonalization process. Reducing the amount of global synchronizations is of paramount importance for the efficiency of a numerical algorithm in a massively parallel setting. We improve the parallel strong scaling properties of exponential integrators by addressing the underlying bottleneck in the linear algebra using low-synchronization GS methods. The resulting orthogonalization algorithms have an accuracy comparable to modified Gram-Schmidt yet are better suited for distributed architecture, as only one global communication is required per orthogonalization-step. We present geophysics-based numerical experiments and standard examples routinely used to test stiff time integrators, which validate that reducing global communication leads to better parallel scalability and reduced time-to-solution for exponential integrators.

Low-synchronization Arnoldi Methods for the Matrix Exponential with Application to Exponential Integrators

Abstract

High order exponential integrators require computing linear combination of exponential like -functions of large matrices times a vector . Krylov projection methods are the most general and remain an efficient choice for computing the matrix-function-vector-product evaluation when the matrix is is large and unable to be explicitly stored, or when obtaining information about the spectrum is expensive. The Krylov approximation relies on the Gram-Schmidt (GS) orthogonalization procedure to produce the orthonormal basis . In parallel, GS orthogonalization requires \textit{global synchronizations} for inner products and vector normalization in the orthogonalization process. Reducing the amount of global synchronizations is of paramount importance for the efficiency of a numerical algorithm in a massively parallel setting. We improve the parallel strong scaling properties of exponential integrators by addressing the underlying bottleneck in the linear algebra using low-synchronization GS methods. The resulting orthogonalization algorithms have an accuracy comparable to modified Gram-Schmidt yet are better suited for distributed architecture, as only one global communication is required per orthogonalization-step. We present geophysics-based numerical experiments and standard examples routinely used to test stiff time integrators, which validate that reducing global communication leads to better parallel scalability and reduced time-to-solution for exponential integrators.

Paper Structure

This paper contains 25 sections, 34 equations, 18 figures, 6 tables, 8 algorithms.

Figures (18)

  • Figure 1: Strong scaling results for AC with low-synchronization hybrid methods for epi4 and epi5.
  • Figure 2: Strong scaling results for AC with low-synchronization hybrid methods for epi6.
  • Figure 3: Strong scaling results for AC with low-synchronization hybrid methods for srerk integrators.
  • Figure 4: Strong scaling results for ADR with low-synchronization hybrid methods for epi4 and epi5.
  • Figure 5: Strong scaling results for ADR with low-synchronization hybrid methods for epi6.
  • ...and 13 more figures