Table of Contents
Fetching ...

A Study of Performance Portability in Plasma Physics Simulations

Josef Ruzicka, Christian Asch, Esteban Meneses, Markus Rampp, Erwin Laure

TL;DR

The results show that both Kokkos and OpenMP are powerful tools to achieve performance portability and decent"out-of-the-box"performance, even for the very latest hardware platforms.

Abstract

The high-performance computing (HPC) community has recently seen a substantial diversification of hardware platforms and their associated programming models. From traditional multicore processors to highly specialized accelerators, vendors and tool developers back up the relentless progress of those architectures. In the context of scientific programming, it is fundamental to consider performance portability frameworks, i.e., software tools that allow programmers to write code once and run it on different computer architectures without sacrificing performance. We report here on the benefits and challenges of performance portability using a field-line tracing simulation and a particle-in-cell code, two relevant applications in computational plasma physics with applications to magnetically-confined nuclear-fusion energy research. For these applications we report performance results obtained on four HPC platforms with server-class CPUs from Intel (Xeon) and AMD (EPYC), and high-end GPUs from Nvidia and AMD, including the latest Nvidia H100 GPU and the novel AMD Instinct MI300A APU. Our results show that both Kokkos and OpenMP are powerful tools to achieve performance portability and decent "out-of-the-box" performance, even for the very latest hardware platforms. For our applications, Kokkos provided performance portability to the broadest range of hardware architectures from different vendors.

A Study of Performance Portability in Plasma Physics Simulations

TL;DR

The results show that both Kokkos and OpenMP are powerful tools to achieve performance portability and decent"out-of-the-box"performance, even for the very latest hardware platforms.

Abstract

The high-performance computing (HPC) community has recently seen a substantial diversification of hardware platforms and their associated programming models. From traditional multicore processors to highly specialized accelerators, vendors and tool developers back up the relentless progress of those architectures. In the context of scientific programming, it is fundamental to consider performance portability frameworks, i.e., software tools that allow programmers to write code once and run it on different computer architectures without sacrificing performance. We report here on the benefits and challenges of performance portability using a field-line tracing simulation and a particle-in-cell code, two relevant applications in computational plasma physics with applications to magnetically-confined nuclear-fusion energy research. For these applications we report performance results obtained on four HPC platforms with server-class CPUs from Intel (Xeon) and AMD (EPYC), and high-end GPUs from Nvidia and AMD, including the latest Nvidia H100 GPU and the novel AMD Instinct MI300A APU. Our results show that both Kokkos and OpenMP are powerful tools to achieve performance portability and decent "out-of-the-box" performance, even for the very latest hardware platforms. For our applications, Kokkos provided performance portability to the broadest range of hardware architectures from different vendors.

Paper Structure

This paper contains 14 sections, 1 equation, 4 figures, 6 tables.

Figures (4)

  • Figure 1: 2D plasma physics simulation representations
  • Figure 2: Time-to-solution for the three different problem sizes on a single CPU/GPU with the OpenMP and Kokkos code variants of BS-SOLCTRA.
  • Figure 3: BS-SOLCTRA parallel scaling (runtime as a function of employed A100 GPUs) for the Kokkos (solid lines) and the OpenMP (dashed lines) code variants for three different problem sizes (colour coded).
  • Figure 4: Performance on a single CPU/GPU for the Kokkos (cross hatch) and OpenMP (solid fill) variants of the PIC code for different initialization patterns. The black error bars indicate the measured run-to-run variations.