3-center and 4-center 2-particle Gaussian AO integrals on modern accelerated processors
Andrey Asadchev, Edward F. Valeev
TL;DR
This work extends the matrix-form McMurchie-Davidson approach to efficient GPU evaluation of 3-center and 4-center 2-electron Gaussian AO integrals across low to high angular momenta ($l\leq 6$). It introduces three MD variants (V0, V1, V2) tailored to different memory and compute regimes, enabling substantial GPU throughput—up to 25–70% of peak—via data-layout optimizations and batched GEMMs, with detailed performance benchmarks. A preliminary exchange-operator implementation demonstrates practical applicability to large systems (tens of thousands of AOs) and is integrated into the LibintX open-source library (LGPL3, C++17/CUDA). The work highlights the viability of a matrix MD-based engine for modern accelerators and outlines future enhancements, including CPU SIMD adaptations, complete exchange machinery, and derivatives for advanced electronic-structure methods.
Abstract
We report an implementation of the McMurchie-Davidson (MD) algorithm for 3-center and 4-center 2-particle integrals over Gaussian atomic orbitals (AOs) with low and high angular momenta $l$ and varying degrees of contraction for graphical processing units (GPUs). This work builds upon our recent implementation of a matrix form of the MD algorithm that is efficient for GPU evaluation of 4-center 2-particle integrals over Gaussian AOs of high angular momenta ($l\geq 4$) [$\mathit{J. Phys. Chem. A}\ \mathbf{127}$, 10889 (2023)]. The use of unconventional data layouts and three variants of the MD algorithm allow to evaluate integrals in double precision with sustained performance between 25% and 70% of the theoretical hardware peak. Performance assessment includes integrals over AOs with $l\leq 6$ (higher $l$ is supported). Preliminary implementation of the Hartree-Fock exchange operator is presented and assessed for computations with up to quadruple-zeta basis and more than 20,000 AOs. The corresponding C++ code is a part of the experimental open-source $\mathtt{LibintX}$ library available at $\mathbf{github.com:ValeevGroup/LibintX}$.
