Table of Contents
Fetching ...

Acceleration of the CASINO quantum Monte Carlo software using graphics processing units and OpenACC

B. Thorpe, M. J. Smith, P. J. Hasnip, N. D. Drummond

TL;DR

Results are presented for three- and two-dimensional homogeneous electron gases and ab initio simulations of bulk materials, showing that significant speedups of up to a factor of 2.5 can be achieved by the use of GPUs when several hundred particles are included in the simulations.

Abstract

We describe how quantum Monte Carlo calculations using the CASINO software can be accelerated using graphics processing units (GPUs) and OpenACC. In particular we consider offloading Ewald summation, the evaluation of long-range two-body terms in the Jastrow correlation factor, and the evaluation of orbitals in a blip basis set. We present results for three- and two-dimensional homogeneous electron gases and ab initio simulations of bulk materials, showing that significant speedups of up to a factor of 2.5 can be achieved by the use of GPUs when several hundred particles are included in the simulations. The use of single-precision arithmetic can improve the speedup further without significant detriment to the accuracy of the calculations.

Acceleration of the CASINO quantum Monte Carlo software using graphics processing units and OpenACC

TL;DR

Results are presented for three- and two-dimensional homogeneous electron gases and ab initio simulations of bulk materials, showing that significant speedups of up to a factor of 2.5 can be achieved by the use of GPUs when several hundred particles are included in the simulations.

Abstract

We describe how quantum Monte Carlo calculations using the CASINO software can be accelerated using graphics processing units (GPUs) and OpenACC. In particular we consider offloading Ewald summation, the evaluation of long-range two-body terms in the Jastrow correlation factor, and the evaluation of orbitals in a blip basis set. We present results for three- and two-dimensional homogeneous electron gases and ab initio simulations of bulk materials, showing that significant speedups of up to a factor of 2.5 can be achieved by the use of GPUs when several hundred particles are included in the simulations. The use of single-precision arithmetic can improve the speedup further without significant detriment to the accuracy of the calculations.

Paper Structure

This paper contains 24 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Schematic of an NVIDIA A100 die and an individual SM.
  • Figure 2: Runtime comparison for VMC simulations of the 3D HEG with and without offloading of Ewald interactions to the GPU. All runs were performed using all 32 cores of a POWER9 CPU @ 2.7 GHz with a single NVIDIA V100 GPU.
  • Figure 3: Runtime comparison for VMC simulations of a 3D HEG with and without offloading of the two-body Jastrow $p$ terms to the GPU. The results have been averaged over five runs. All runs were performed using all cores of a 32-core POWER9 CPU @ 2.7 GHz with an NVIDIA V100 GPU.
  • Figure 4: Runtime comparison of OpenBLAS (on CPU) vs. cuBLAS (on GPU) for the ddot function with an increasing number of vector elements. These calculations were performed using all cores of a 32-core POWER9 CPU @ 2.7 GHz and an NVIDIA V100 GPU.
  • Figure 5: Runtime comparison for VMC simulations of a 3D HEG with DP and SP arithmetic averaged over five runs. All runs were performed using all cores of a 32-core POWER9 CPU @ 2.7 GHz with an NVIDIA V100 GPU.
  • ...and 3 more figures