Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs
John Tramm, Paul Romano, Patrick Shriwise, Amanda Lund, Johannes Doerfert, Patrick Steinbrecher, Andrew Siegel, Gavin Ridley
TL;DR
OpenMC's GPU port using OpenMP target offloading is evaluated for performance portability across AMD, Intel, and NVIDIA GPUs on Frontier, Polaris, and Aurora. The study analyzes event-based GPU parallelism, sorting, and other optimizations, benchmarking against CPU baselines and other CPU MC codes. Results show robust cross-vendor performance, including exceptionally strong weak scaling and high per-node throughput, with Intel Ponte Vecchio GPUs delivering leading performance on depleted-fuel SMR problems. The work demonstrates the viability of portable GPU-based Monte Carlo simulations at exascale scales and highlights the potential of OpenMP offloading for production HPC codes.
Abstract
OpenMC is an open source Monte Carlo neutral particle transport application that has recently been ported to GPU using the OpenMP target offloading model. We examine the performance of OpenMC at scale on the Frontier, Polaris, and Aurora supercomputers, demonstrating that performance portability has been achieved by OpenMC across all three major GPU vendors (AMD, NVIDIA, and Intel). OpenMC's GPU performance is compared to both the traditional CPU-based version of OpenMC as well as several other state-of-the-art CPU-based Monte Carlo particle transport applications. We also provide historical context by analyzing OpenMC's performance on several legacy GPU and CPU architectures. This work includes some of the first published results for a scientific simulation application at scale on a supercomputer featuring Intel's Max series "Ponte Vecchio" GPUs. It is also one of the first demonstrations of a large scientific production application using the OpenMP target offloading model to achieve high performance on all three major GPU platforms.
