Parallel Sparse and Data-Sparse Factorization-based Linear Solvers
Xiaoye Sherry Li, Yang Liu
TL;DR
The paper surveys parallel sparse direct solvers on modern HPC architectures, focusing on reducing data movement via communication-avoiding strategies and lowering arithmetic/memory costs through data-sparse, rank-structured (notably $\mathcal{H}$, $\mathcal{H}^2$, HSS, HODLR) representations. It covers algorithmic frameworks (3D CA, DAG/tree-based scheduling), GPU-accelerated implementations, and hybrid structure/data-sparse solvers that combine frontal matrices with compressed blocks. Practical aspects include preprocessing, construction, factorization, and solve phases, along with distributed-memory layouts and batching to harness fine-grained parallelism on CPUs/GPUs. The article also catalogs software packages and delineates open problems in GPU-resident solvers, symmetric indefinite solves, and theoretical analyses of data-sparse methods. Overall, it presents a comprehensive view of advancing scalable, robust direct solvers for large-scale, ill-conditioned systems arising in PDEs, integral equations, and kernel-based computations.
Abstract
Efficient solutions of large-scale, ill-conditioned and indefinite algebraic equations are ubiquitously needed in numerous computational fields, including multiphysics simulations, machine learning, and data science. Because of their robustness and accuracy, direct solvers are crucial components in building a scalable solver toolchain. In this article, we will review recent advances of sparse direct solvers along two axes: 1) reducing communication and latency costs in both task- and data-parallel settings, and 2) reducing computational complexity via low-rank and other compression techniques such as hierarchical matrix algebra. In addition to algorithmic principles, we also illustrate the key parallelization challenges and best practices to deliver high speed and reliability on modern heterogeneous parallel machines.
