Table of Contents
Fetching ...

FourPhonon_GPU: A GPU-accelerated framework for calculating phonon scattering rates and thermal conductivity

Ziqi Guo, Xiulin Ruan, Guang Lin

TL;DR

The paper tackles the high computational cost of fully resolving phonon scattering by four-phonon processes, which scales as $N^3$ for 3ph and $N^4$ for 4ph, hindering accurate thermal conductivity predictions. It introduces FourPhonon_GPU, a GPU-accelerated framework built on the FourPhonon package using OpenACC to realize a CPU–GPU heterogeneous workflow where the CPU enumerates scattering events and the GPU evaluates rates in parallel, preserving accuracy while dramatically reducing runtime. Key contributions include achieving over $25$-fold acceleration in the scattering-rate step and over $10$-fold total runtime speedup, a detailed comparison of GPU-offload versus CPU–GPU hybrid strategies, and benchmarking across GPU architectures (A100 > A30 > A10) on silicon as a test case, with explicit memory considerations for dense q-meshes. The work enables rigorous, first-principles phonon transport calculations at scale, offering a practical path toward accelerated materials discovery and outlining future directions such as iterative solvers and mixed-precision approaches.

Abstract

Accurately predicting phonon scattering is crucial for understanding thermal transport properties. However, the computational cost of such calculations, especially for four-phonon scattering, can often be more prohibitive when large number of phonon branches and scattering processes are involved. In this work, we present FourPhonon_GPU, a GPU-accelerated framework for three-phonon and four-phonon scattering rate calculations based on the FourPhonon package. By leveraging OpenACC and adopting a heterogeneous CPU-GPU computing strategy, we efficiently offload massive, parallelizable tasks to the GPU while using the CPU for process enumeration and control-heavy operations. Our approach achieves over 25x acceleration for the scattering rate computation step and over 10x total runtime speedup without sacrificing accuracy. Benchmarking on various GPU architectures confirms the method's scalability and highlights the importance of aligning parallelization strategies with hardware capabilities. This work provides an efficient and accurate computational tool for phonon transport modeling and opens pathways for accelerated materials discovery.

FourPhonon_GPU: A GPU-accelerated framework for calculating phonon scattering rates and thermal conductivity

TL;DR

The paper tackles the high computational cost of fully resolving phonon scattering by four-phonon processes, which scales as for 3ph and for 4ph, hindering accurate thermal conductivity predictions. It introduces FourPhonon_GPU, a GPU-accelerated framework built on the FourPhonon package using OpenACC to realize a CPU–GPU heterogeneous workflow where the CPU enumerates scattering events and the GPU evaluates rates in parallel, preserving accuracy while dramatically reducing runtime. Key contributions include achieving over -fold acceleration in the scattering-rate step and over -fold total runtime speedup, a detailed comparison of GPU-offload versus CPU–GPU hybrid strategies, and benchmarking across GPU architectures (A100 > A30 > A10) on silicon as a test case, with explicit memory considerations for dense q-meshes. The work enables rigorous, first-principles phonon transport calculations at scale, offering a practical path toward accelerated materials discovery and outlining future directions such as iterative solvers and mixed-precision approaches.

Abstract

Accurately predicting phonon scattering is crucial for understanding thermal transport properties. However, the computational cost of such calculations, especially for four-phonon scattering, can often be more prohibitive when large number of phonon branches and scattering processes are involved. In this work, we present FourPhonon_GPU, a GPU-accelerated framework for three-phonon and four-phonon scattering rate calculations based on the FourPhonon package. By leveraging OpenACC and adopting a heterogeneous CPU-GPU computing strategy, we efficiently offload massive, parallelizable tasks to the GPU while using the CPU for process enumeration and control-heavy operations. Our approach achieves over 25x acceleration for the scattering rate computation step and over 10x total runtime speedup without sacrificing accuracy. Benchmarking on various GPU architectures confirms the method's scalability and highlights the importance of aligning parallelization strategies with hardware capabilities. This work provides an efficient and accurate computational tool for phonon transport modeling and opens pathways for accelerated materials discovery.

Paper Structure

This paper contains 4 sections, 7 figures, 2 tables, 5 algorithms.

Figures (7)

  • Figure 1: Workflow of CPU-GPU heterogeneous computing.
  • Figure 2: GPU-only vs. CPU+GPU heterogeneous computing. The GPU-only approach suffers from warp divergence due to conditional branching when filtering forbidden phonon scattering processes, leading to reduced computational efficiency. In contrast, the CPU+GPU heterogeneous strategy first filters and prepares valid scattering processes on the CPU, allowing the GPU to execute the computation with higher parallel efficiency.
  • Figure 3: Triangular iteration region due to symmetry.
  • Figure 4: Comparison of total computational cost between CPU-only and CPU-GPU hybrid implementations across different q-mesh sizes. (a) 3ph scattering, (b) 3ph+4ph scattering. The insets show the isolated computational cost of the 3ph and 4ph scattering step alone.
  • Figure 5: Comparison between all-modes and mode-by-mode parallelization strategies. (a) Computational cost and (b) GPU memory usage for 3ph and 3ph+4ph scattering calculations using a q-mesh of 32$\times$32$\times$32 and 10$\times$10$\times$10, respectively. (c) Computational cost for 3ph+4ph scattering with a 16$\times$16$\times$16 q-mesh, comparing CPU-only and CPU-GPU with mode-by-mode parallelization implementations.
  • ...and 2 more figures