Accelerating the Particle-In-Cell code ECsim with OpenACC

Elisabetta Boella; Nitin Shukla; Filippo Spiga; Mozhgan Kabiri Chimeh; Matt Bettencourt; Maria Elena Innocenti

Accelerating the Particle-In-Cell code ECsim with OpenACC

Elisabetta Boella, Nitin Shukla, Filippo Spiga, Mozhgan Kabiri Chimeh, Matt Bettencourt, Maria Elena Innocenti

Abstract

The Particle-In-Cell (PIC) method is a computational technique widely used in plasma physics to model plasmas at the kinetic level. In this work, we present our effort to prepare the semi-implicit energy-conserving PIC code ECsim for exascale architectures. To achieve this, we adopted a pragma-based acceleration strategy using OpenACC, which enables high performance while requiring minimal code restructuring. On the pre-exascale Leonardo system, the accelerated code achieves a $5 \times$ speedup and a $3 \times$ reduction in energy consumption compared to the CPU reference code. Performance comparisons across multiple NVIDIA GPU generations show substantial benefits from the GH200 unified memory architecture. Finally, strong and weak scaling tests on Leonardo demonstrate efficiency of $70 \%$ and $78 \%$ up to 64 and 1024 GPUs, respectively.

Accelerating the Particle-In-Cell code ECsim with OpenACC

Abstract

speedup and a

reduction in energy consumption compared to the CPU reference code. Performance comparisons across multiple NVIDIA GPU generations show substantial benefits from the GH200 unified memory architecture. Finally, strong and weak scaling tests on Leonardo demonstrate efficiency of

and

up to 64 and 1024 GPUs, respectively.

Paper Structure (5 sections, 6 equations, 7 figures, 2 tables)

This paper contains 5 sections, 6 equations, 7 figures, 2 tables.

Introduction
Code structure
Strategy for adding GPU support
Performance evaluation
Summary & Perspectives

Figures (7)

Figure 1: Walltime and speedup for a typical 2D simulation using the CPU reference (left bar) and OpenACC-accelerated (right bar) ECsim on one node of Leonardo Booster. Colours denote code sections: blue initialisation, orange moment gathering, green field solver, purple particle mover, and gray I/O. The pure CPU simulation was ran with 32 MPI ranks, the heterogeneous simulation was performed with four GPUs and four MPI tasks.
Figure 2: Time evolution of the electric field energy (left panel) and magnetic field energy (right panel) from simulations of the two-stream instability (left panel) and current filamentation instability (right panel), respectively. Red solid lines show results from the CPU reference version of ECsim, while black dashed lines correspond to the accelerated version of the code. All values are normalised to reference quantities.
Figure 3: GPU utilisation (first row), frequency (second row), power (third row), and temperature (fourth row) during an ECsim run. The same use case as in Figure \ref{['fig:CPU_GPU']} was employed. Four GPUs were utilised, but only the values for one representative device are shown here due to the strong similarity across all GPUs.
Figure 4: Distribution of energy values across ten simulations for runs on CPU only (left panel) and on GPU (right panel). The red dashed lines represent the average energy-to-solution. Same input parameters and number of MPI tasks and GPUs as in Figure \ref{['fig:CPU_GPU']} were used.
Figure 5: Walltime results for the particle mover (left panel) and moment gathering (right panel) obtained on various NVIDIA GPU generations. The corresponding CPU configurations are provided for completeness. Speedup values shown above the bars are computed relative to the walltime obtained on the V100 GPU.
...and 2 more figures

Accelerating the Particle-In-Cell code ECsim with OpenACC

Abstract

Accelerating the Particle-In-Cell code ECsim with OpenACC

Authors

Abstract

Table of Contents

Figures (7)