Table of Contents
Fetching ...

HPRMAT: A high-performance R-matrix solver with GPU acceleration for coupled-channel problems in nuclear physics

Jin Lei

TL;DR

HPRMAT addresses the computational bottleneck in R-matrix coupled-channel scattering by replacing legacy inversion with direct LU solving across four backends, including GPU-accelerated and mixed-precision options. The library preserves compatibility with Descouvemont’s interface while achieving up to 9× CPU speedups and 18× over legacy codes for matrices of size up to $N = n_{\rm ch} \times n_{\rm lag} \approx 25600$, and maintains cross-section accuracy better than $10^{-5}$ relative. The mixed-precision approach leverages the FP32:FP64 throughput advantage on consumer GPUs to factorize in $FP32$ with iterative refinement to $FP64$ accuracy, broadening accessibility to desktop workstations. Validation against Descouvemont’s reference code across multiple test cases confirms both numerical reliability and physical fidelity, enabling large-scale CDCC and coupled-channel calculations without expensive data-center resources. $N$ indicates the total system dimension $N = n_{\rm ch} \times n_{\rm lag}$, and cross sections remain accurate within the required nuclear-physics tolerances.$

Abstract

I present HPRMAT, a high-performance solver library for the linear systems arising in R-matrix coupled-channel scattering calculations in nuclear physics. Designed as a drop-in replacement for the linear algebra routines in existing R-matrix codes, HPRMAT employs direct linear equation solving with optimized libraries instead of traditional matrix inversion, achieving significant performance improvements. The package provides four solver backends: (1) double-precision LU factorization, (2) mixed-precision arithmetic with iterative refinement, (3) a Woodbury formula approach exploiting the kinetic-coupling matrix structure, and (4) GPU acceleration. Benchmark calculations demonstrate that the GPU solver achieves up to 9$\times$ speedup over optimized CPU direct solvers, and 18$\times$ over legacy inversion-based codes, for large matrices ($N=25600$). The mixed-precision strategy is particularly effective on consumer GPUs (e.g., NVIDIA RTX 3090/4090), where single-precision throughput exceeds double-precision by a factor of 64:1; by performing factorization in single precision with iterative refinement, HPRMAT overcomes the poor FP64 performance of consumer hardware while maintaining double-precision accuracy. This makes large-scale CDCC and coupled-channel calculations accessible to researchers using standard desktop workstations, without requiring expensive data-center GPUs. CPU-only solvers provide 5--7$\times$ speedup through optimized libraries and algorithmic improvements. All solvers maintain physics accuracy with relative errors below $10^{-5}$ in cross-section calculations, validated against Descouvemont's reference code (Comput.\ Phys.\ Commun.\ 200, 199--219 (2016)). HPRMAT provides interfaces for Fortran, C, Python, and Julia.

HPRMAT: A high-performance R-matrix solver with GPU acceleration for coupled-channel problems in nuclear physics

TL;DR

HPRMAT addresses the computational bottleneck in R-matrix coupled-channel scattering by replacing legacy inversion with direct LU solving across four backends, including GPU-accelerated and mixed-precision options. The library preserves compatibility with Descouvemont’s interface while achieving up to 9× CPU speedups and 18× over legacy codes for matrices of size up to , and maintains cross-section accuracy better than relative. The mixed-precision approach leverages the FP32:FP64 throughput advantage on consumer GPUs to factorize in with iterative refinement to accuracy, broadening accessibility to desktop workstations. Validation against Descouvemont’s reference code across multiple test cases confirms both numerical reliability and physical fidelity, enabling large-scale CDCC and coupled-channel calculations without expensive data-center resources. indicates the total system dimension , and cross sections remain accurate within the required nuclear-physics tolerances.$

Abstract

I present HPRMAT, a high-performance solver library for the linear systems arising in R-matrix coupled-channel scattering calculations in nuclear physics. Designed as a drop-in replacement for the linear algebra routines in existing R-matrix codes, HPRMAT employs direct linear equation solving with optimized libraries instead of traditional matrix inversion, achieving significant performance improvements. The package provides four solver backends: (1) double-precision LU factorization, (2) mixed-precision arithmetic with iterative refinement, (3) a Woodbury formula approach exploiting the kinetic-coupling matrix structure, and (4) GPU acceleration. Benchmark calculations demonstrate that the GPU solver achieves up to 9 speedup over optimized CPU direct solvers, and 18 over legacy inversion-based codes, for large matrices (). The mixed-precision strategy is particularly effective on consumer GPUs (e.g., NVIDIA RTX 3090/4090), where single-precision throughput exceeds double-precision by a factor of 64:1; by performing factorization in single precision with iterative refinement, HPRMAT overcomes the poor FP64 performance of consumer hardware while maintaining double-precision accuracy. This makes large-scale CDCC and coupled-channel calculations accessible to researchers using standard desktop workstations, without requiring expensive data-center GPUs. CPU-only solvers provide 5--7 speedup through optimized libraries and algorithmic improvements. All solvers maintain physics accuracy with relative errors below in cross-section calculations, validated against Descouvemont's reference code (Comput.\ Phys.\ Commun.\ 200, 199--219 (2016)). HPRMAT provides interfaces for Fortran, C, Python, and Julia.

Paper Structure

This paper contains 24 sections, 6 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Schematic structure of the R-matrix Hamiltonian for a 4-channel problem ($n_{\rm ch} = 4$) with local potentials. Diagonal blocks (blue, dense pattern) contain the full kinetic energy plus Bloch operator matrices $\mathbf{K}_\alpha$ and diagonal potentials; off-diagonal blocks (orange, diagonal line) are diagonal coupling matrices $\mathbf{D}_{\alpha\alpha'}$. Each block has dimension $n_{\rm lag} \times n_{\rm lag}$, giving total matrix dimension $N = n_{\rm ch} \times n_{\rm lag}$. For non-local potentials, all blocks become full matrices.
  • Figure 2: Scaling of computation time with matrix dimension $N$ (log-log scale). All solvers exhibit the expected $O(N^3)$ scaling (dashed line). Note: Absolute timings are hardware-dependent; this plot shows results on a specific test system (Intel Xeon Gold 6248R + NVIDIA RTX 3090) and is intended to illustrate the relative performance of different solvers and the scaling behavior, not absolute performance on other hardware.
  • Figure 3: HPRMAT solver selection and fallback mechanism. Each solver type has automatic fallback paths to ensure numerical safety: Type 2 falls back to FP64 if single-precision LU fails; Type 4 falls back to CPU solvers if GPU is unavailable or cuSOLVER fails. Type 3 (Woodbury) requires local potentials and is not applicable for non-local interactions.