Electron-phonon physics at the exascale: A hybrid MPI-GPU-OpenMP framework for scalable Wannier interpolation
Tae Yun Kim, Zhe Liu, Sabyasachi Tiwari, Elena R. Margine, Feliciano Giustino
TL;DR
A GPU porting strategy that integrates naturally into the current EPW implementation, and is seamlessly portable to NVIDIA, AMD, and Intel GPUs is designed, which achieves up to 29-fold speedup on leadership-class supercomputers equipped with NVIDIA and Intel accelerators.
Abstract
We demonstrate a highly efficient GPU implementation of the Wannier interpolation of electron-phonon matrix elements in the EPW code. Building on a systematic analysis of the computational complexity of the algorithm for electron-phonon interpolation, we designed a GPU porting strategy that integrates naturally into the current EPW implementation, and is seamlessly portable to NVIDIA, AMD, and Intel GPUs. We demonstrate this development via extensive benchmarks on conventional semiconductors such as silicon and monolayer MoS$_2$, as well as a large-scale application to topological stanene nanoribbons of width as large as 20nm, which was intractable with previous implementations. Compared to the single MPI parallelization scheme of EPW v5.9, the resulting hybrid MPI-GPU-OpenMP scheme achieves up to 29-fold speedup on leadership-class supercomputers equipped with NVIDIA and Intel accelerators, namely Vista at the Texas Advanced Computing Center, Perlmutter at the National Energy Research Scientific Computing Center, and Aurora at the Argonne Leadership Computing Facility. This framework also achieves nearly ideal scalability up to thousands of GPU nodes on the Aurora supercomputer. With this development, EPW is ready to support electron-phonon physics calculations on exascale platforms.
