Implementation of the multigrid Gaussian-Plane-Wave algorithm with GPU acceleration in PySCF

Rui Li; Xing Zhang; Qiming Sun; Yuanheng Wang; Junjie Yang; Garnet Kin-Lic Chan

Implementation of the multigrid Gaussian-Plane-Wave algorithm with GPU acceleration in PySCF

Rui Li, Xing Zhang, Qiming Sun, Yuanheng Wang, Junjie Yang, Garnet Kin-Lic Chan

Abstract

We introduce a GPU-accelerated multigrid Gaussian-Plane-Wave density fitting (FFTDF) approach for efficient Fock builds and nuclear gradient evaluations within Kohn-Sham density functional theory, as implemented in the GPU4PySCF module of PySCF. Our CUDA kernels employ a grid-based parallelization strategy for contracting Gaussian basis function pairs and achieve up to 80% of the FP64 peak performance on NVIDIA GPUs, with no loss of efficiency for high angular momentum (up to f-shell) functions. Benchmark calculations on molecules and solids with up to 1536 atoms and 20480 basis functions show up to 25x speedup on an H100 GPU relative to the CPU implementation on a 28-core shared memory node. For a 256-water cluster, the ground-state energy and nuclear gradients can be computed in ~30 seconds on a single H100 GPU. This implementation serves as an open-source foundation for many applications, such as ab initio molecular dynamics and high-throughput calculations.

Implementation of the multigrid Gaussian-Plane-Wave algorithm with GPU acceleration in PySCF

Abstract

Implementation of the multigrid Gaussian-Plane-Wave algorithm with GPU acceleration in PySCF

Abstract

Paper Structure

Table of Contents

Figures (4)