Serinv: A Scalable Library for the Selected Inversion of Block-Tridiagonal with Arrowhead Matrices
Vincent Maillou, Lisa Gaedke-Merzhaeuser, Alexandros Nikolaos Ziogas, Olaf Schenk, Mathieu Luisier
TL;DR
This work tackles the challenge of extracting only selected entries of the inverse for large, structured sparse matrices arising in climate modeling and materials science. It introduces Serinv, a distributed, GPU-accelerated library implementing block-Cholesky-based selected inversion for positive-definite BTA matrices, coupled with a parallel three-phase workflow that builds a reduced system to enable scalable inverse computation. The authors provide a thorough theoretical analysis of complexity, load balancing, and parallel efficiency, and validate the approach through extensive experiments on synthetic and INLA-derived datasets, achieving substantial speedups over PARDISO and MUMPS, and strong/weak scaling up to 16 GPUs (and beyond in some configurations). The results demonstrate practical impact for large-scale Bayesian inference and statistical modeling in earth sciences and nano-scale materials, offering a path to handling larger problems than prior CPU-centric or GPU-limited methods. Key contributions include: (i) a distributed, GPU-accelerated SIA for BTA matrices built on block-Cholesky factorization; (ii) a novel partitioning and permutation scheme to expose parallelism without physically permuting data; (iii) a reduced-system approach that enables efficient parallel selected inversion across partitions; and (iv) comprehensive theoretical and empirical comparisons showing competitive or superior performance and scalability relative to state-of-the-art solvers.
Abstract
The inversion of structured sparse matrices is a key but computationally and memory-intensive operation in many scientific applications. There are cases, however, where only particular entries of the full inverse are required. This has motivated the development of so-called selected-inversion algorithms, capable of computing only specific elements of the full inverse. Currently, most of them are either shared-memory codes or limited to CPU implementations. Here, we introduce Serinv, a scalable library providing distributed, GPU-based algorithms for the selected inversion and Cholesky decomposition of positive-definite, block-tridiagonal arrowhead matrices. This matrix class is highly relevant in statistical climate modeling and materials science applications. The performance of Serinv is demonstrated on synthetic and real datasets from statistical air temperature prediction models. In our numerical tests, Serinv achieves 32.3% strong and 47.2% weak scaling efficiency and up to two orders of magnitude speedup over the sparse direct solvers PARDISO and MUMPS on 16 GPUs.
