HPRMAT: A high-performance R-matrix solver with GPU acceleration for coupled-channel problems in nuclear physics
Jin Lei
TL;DR
HPRMAT addresses the computational bottleneck in R-matrix coupled-channel scattering by replacing legacy inversion with direct LU solving across four backends, including GPU-accelerated and mixed-precision options. The library preserves compatibility with Descouvemont’s interface while achieving up to 9× CPU speedups and 18× over legacy codes for matrices of size up to $N = n_{\rm ch} \times n_{\rm lag} \approx 25600$, and maintains cross-section accuracy better than $10^{-5}$ relative. The mixed-precision approach leverages the FP32:FP64 throughput advantage on consumer GPUs to factorize in $FP32$ with iterative refinement to $FP64$ accuracy, broadening accessibility to desktop workstations. Validation against Descouvemont’s reference code across multiple test cases confirms both numerical reliability and physical fidelity, enabling large-scale CDCC and coupled-channel calculations without expensive data-center resources. $N$ indicates the total system dimension $N = n_{\rm ch} \times n_{\rm lag}$, and cross sections remain accurate within the required nuclear-physics tolerances.$
Abstract
I present HPRMAT, a high-performance solver library for the linear systems arising in R-matrix coupled-channel scattering calculations in nuclear physics. Designed as a drop-in replacement for the linear algebra routines in existing R-matrix codes, HPRMAT employs direct linear equation solving with optimized libraries instead of traditional matrix inversion, achieving significant performance improvements. The package provides four solver backends: (1) double-precision LU factorization, (2) mixed-precision arithmetic with iterative refinement, (3) a Woodbury formula approach exploiting the kinetic-coupling matrix structure, and (4) GPU acceleration. Benchmark calculations demonstrate that the GPU solver achieves up to 9$\times$ speedup over optimized CPU direct solvers, and 18$\times$ over legacy inversion-based codes, for large matrices ($N=25600$). The mixed-precision strategy is particularly effective on consumer GPUs (e.g., NVIDIA RTX 3090/4090), where single-precision throughput exceeds double-precision by a factor of 64:1; by performing factorization in single precision with iterative refinement, HPRMAT overcomes the poor FP64 performance of consumer hardware while maintaining double-precision accuracy. This makes large-scale CDCC and coupled-channel calculations accessible to researchers using standard desktop workstations, without requiring expensive data-center GPUs. CPU-only solvers provide 5--7$\times$ speedup through optimized libraries and algorithmic improvements. All solvers maintain physics accuracy with relative errors below $10^{-5}$ in cross-section calculations, validated against Descouvemont's reference code (Comput.\ Phys.\ Commun.\ 200, 199--219 (2016)). HPRMAT provides interfaces for Fortran, C, Python, and Julia.
