Table of Contents
Fetching ...

Memory-Efficient Recursive Evaluation of 3-Center Gaussian Integrals

Andrey Asadchev, Edward F. Valeev

TL;DR

This work introduces memory-efficient recursive evaluation of $3$-center $2$-body Gaussian integrals on modern GPUs by adopting multiquantal recurrences, in-place use of registers and shared memory, and compile-time kernel generation via C++17. The authors develop an Obara-Saika-based HGPA framework with HRR, VRR1, VRR2, and HRR variants, including 2-q VRR1/HRR and n-q CVRR2 to minimize memory footprints while maintaining performance. Implemented in LibintX, the approach demonstrates competitive performance on CPU and GPU, with notable speedups for density-fitting Coulomb potential evaluations and practical memory savings enabling higher angular momenta. The techniques are presented as broadly applicable beyond 3-center integrals, offering a path toward efficient Coulomb evaluations and potential extensions to four-center cases. Open-source LibintX provides the implementation and API access for broader adoption and future optimizations.

Abstract

To improve the efficiency of Gaussian integral evaluation on modern accelerated architectures FLOP-efficient Obara-Saika-based recursive evaluation schemes are optimized for the memory footprint. For the 3-center 2-particle integrals that are key for the evaluation of Coulomb and other 2-particle interactions in the density-fitting approximation the use of multi-quantal recurrences (in which multiple quanta are created or transferred at once) is shown to produce significant memory savings. Other innovation include leveraging register memory for reduced memory footprint and direct compile-time generation of optimized kernels (instead of custom code generation) with compile-time features of modern C++/CUDA. Performance of conventional and CUDA-based implementations of the proposed schemes is illustrated for both the individual batches of integrals involving up to Gaussians with low and high angular momenta (up to $L=6$) and contraction degrees, as well as for the density-fitting-based evaluation of the Coulomb potential. The computer implementation is available in the open-source LibintX library.

Memory-Efficient Recursive Evaluation of 3-Center Gaussian Integrals

TL;DR

This work introduces memory-efficient recursive evaluation of -center -body Gaussian integrals on modern GPUs by adopting multiquantal recurrences, in-place use of registers and shared memory, and compile-time kernel generation via C++17. The authors develop an Obara-Saika-based HGPA framework with HRR, VRR1, VRR2, and HRR variants, including 2-q VRR1/HRR and n-q CVRR2 to minimize memory footprints while maintaining performance. Implemented in LibintX, the approach demonstrates competitive performance on CPU and GPU, with notable speedups for density-fitting Coulomb potential evaluations and practical memory savings enabling higher angular momenta. The techniques are presented as broadly applicable beyond 3-center integrals, offering a path toward efficient Coulomb evaluations and potential extensions to four-center cases. Open-source LibintX provides the implementation and API access for broader adoption and future optimizations.

Abstract

To improve the efficiency of Gaussian integral evaluation on modern accelerated architectures FLOP-efficient Obara-Saika-based recursive evaluation schemes are optimized for the memory footprint. For the 3-center 2-particle integrals that are key for the evaluation of Coulomb and other 2-particle interactions in the density-fitting approximation the use of multi-quantal recurrences (in which multiple quanta are created or transferred at once) is shown to produce significant memory savings. Other innovation include leveraging register memory for reduced memory footprint and direct compile-time generation of optimized kernels (instead of custom code generation) with compile-time features of modern C++/CUDA. Performance of conventional and CUDA-based implementations of the proposed schemes is illustrated for both the individual batches of integrals involving up to Gaussians with low and high angular momenta (up to ) and contraction degrees, as well as for the density-fitting-based evaluation of the Coulomb potential. The computer implementation is available in the open-source LibintX library.
Paper Structure (16 sections, 10 equations, 2 figures, 6 tables)

This paper contains 16 sections, 10 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Mapping of the work and data flow onto the device compute resources and shared memory within the VRR1 part of the kernel for computing $(dd|s)$ using a 1-q VRR1 variant with a simple static memory allocation strategy. Specifically, shared memory layout adopted during generation of $[e]$ shell sets writes the $[e]^{(m\geq c+1)}$ shellsets to a block of memory sufficient to hold the target $[e]^{(c)}$ shell set.
  • Figure 2: Mapping of the work and data flow onto the device compute resources and shared memory within the VRR1 part of the kernel for computing $(dd|s)$ using the 2-q VRR1 variant.