Table of Contents
Fetching ...

Testing and benchmarking emerging supercomputers via the MFC flow solver

Benjamin Wilfong, Anand Radhakrishnan, Henry A. Le Berre, Tanush Prathi, Stephen Abbott, Spencer H. Bryngelson

TL;DR

The paper addresses how to test and benchmark emerging supercomputers using a portable CFD code workflow. It introduces MFC, a toolchain that automates input generation, compilation, regression testing, and benchmarking to compare compiler-hardware configurations with minimal software engineering. Key contributions include a standardized 3D benchmark case, a regression test framework with UUID-tagged golden data, and an automated performance suite that reveals compiler and kernel bottlenecks across a wide range of CPUs, GPUs, and APUs. The work demonstrates MFC’s scalability and reliability as a practical, open-source tool for evaluating new hardware and informs improvements in compilers and HPC systems.

Abstract

Deploying new supercomputers requires testing and evaluation via application codes. Portable, user-friendly tools enable evaluation, and the Multicomponent Flow Code (MFC), a computational fluid dynamics (CFD) code, addresses this need. MFC is adorned with a toolchain that automates input generation, compilation, batch job submission, regression testing, and benchmarking. The toolchain design enables users to evaluate compiler-hardware combinations for correctness and performance with limited software engineering experience. As with other PDE solvers, wall time per spatially discretized grid point serves as a figure of merit. We present MFC benchmarking results for five generations of NVIDIA GPUs, three generations of AMD GPUs, and various CPU architectures, utilizing Intel, Cray, NVIDIA, AMD, and GNU compilers. These tests have revealed compiler bugs and regressions on recent machines such as Frontier and El Capitan. MFC has benchmarked approximately 50 compute devices and 5 flagship supercomputers.

Testing and benchmarking emerging supercomputers via the MFC flow solver

TL;DR

The paper addresses how to test and benchmark emerging supercomputers using a portable CFD code workflow. It introduces MFC, a toolchain that automates input generation, compilation, regression testing, and benchmarking to compare compiler-hardware configurations with minimal software engineering. Key contributions include a standardized 3D benchmark case, a regression test framework with UUID-tagged golden data, and an automated performance suite that reveals compiler and kernel bottlenecks across a wide range of CPUs, GPUs, and APUs. The work demonstrates MFC’s scalability and reliability as a practical, open-source tool for evaluating new hardware and informs improvements in compilers and HPC systems.

Abstract

Deploying new supercomputers requires testing and evaluation via application codes. Portable, user-friendly tools enable evaluation, and the Multicomponent Flow Code (MFC), a computational fluid dynamics (CFD) code, addresses this need. MFC is adorned with a toolchain that automates input generation, compilation, batch job submission, regression testing, and benchmarking. The toolchain design enables users to evaluate compiler-hardware combinations for correctness and performance with limited software engineering experience. As with other PDE solvers, wall time per spatially discretized grid point serves as a figure of merit. We present MFC benchmarking results for five generations of NVIDIA GPUs, three generations of AMD GPUs, and various CPU architectures, utilizing Intel, Cray, NVIDIA, AMD, and GNU compilers. These tests have revealed compiler bugs and regressions on recent machines such as Frontier and El Capitan. MFC has benchmarked approximately 50 compute devices and 5 flagship supercomputers.

Paper Structure

This paper contains 15 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The MFC toolchain and its connectivity.
  • Figure 2: Weak scaling results for MFC on five flagship supercomputers. Near-ideal scaling is observed for multiple generations of AMD and NVIDIA hardware. \ref{['tab:wsNumbs']} shows the details of each system's base case, limit case, and efficiency.
  • Figure 3: Strong scaling performance on (a) OLCF Frontier and (b) CSCS Alps. The speedup is calculated as the ratio of the grindtime for a given processor count to the grindtime of the 8 rank baseline. The impact of using GPU-Aware MPI to reduce communication overhead and improve strong scaling efficiency is shown in the OLCF Frontier results. Extension of near-ideal strong scaling behavior follows from using a larger base case on CSCS Alps.