Table of Contents
Fetching ...

The First OpenFOAM HPC Challenge (OHC-1)

Sergey Lesnik, Gregor Olenik, Mark Wassermann

Abstract

The first OpenFOAM HPC Challenge (OHC-1) was organised by the OpenFOAM HPC Technical Committee (HPCTC) to collect a snapshot of OpenFOAM's computational performance on contemporary production hardware and to compare hardware-constrained submissions with software-track optimisations. Participants ran a common incompressible steady-state RANS case, the open-closed cooling DrivAer (occDrivAer) configuration, on prescribed meshes, submitting either with the reference setup (hardware track) or with modified solvers, decomposition strategies, or accelerator offloading (software track). In total, 237 valid datapoints were submitted by 12 contributors: 175 in the hardware track and 62 in the software track. The hardware track covered 25 distinct CPU models across AMD, Intel, and ARM families, with runs spanning from single-node configurations up to 256 nodes (32768 CPU cores). Wall-clock times ranged from 7.8 minutes to 65.7 hours and reported energy-to-solution from 2.1 to 236.9 kWh. Analysis of the hardware track identified a Pareto front of optimal balance between time- and energy-to-solution, and revealed that on-package high-bandwidth memory (HBM) dominates single-node performance for next-generation CPUs. Software-track submissions achieved up to 28% lower energy per iteration, 17% higher maximum performance per node, and 72% shorter minimum time per iteration than the best hardware-track results, with full GPU ports and selective-memory optimisations leading the performance range. This manuscript describes the challenge organisation, the case setup and metrics, and presents the main findings from both tracks together with an outlook for future challenges.

The First OpenFOAM HPC Challenge (OHC-1)

Abstract

The first OpenFOAM HPC Challenge (OHC-1) was organised by the OpenFOAM HPC Technical Committee (HPCTC) to collect a snapshot of OpenFOAM's computational performance on contemporary production hardware and to compare hardware-constrained submissions with software-track optimisations. Participants ran a common incompressible steady-state RANS case, the open-closed cooling DrivAer (occDrivAer) configuration, on prescribed meshes, submitting either with the reference setup (hardware track) or with modified solvers, decomposition strategies, or accelerator offloading (software track). In total, 237 valid datapoints were submitted by 12 contributors: 175 in the hardware track and 62 in the software track. The hardware track covered 25 distinct CPU models across AMD, Intel, and ARM families, with runs spanning from single-node configurations up to 256 nodes (32768 CPU cores). Wall-clock times ranged from 7.8 minutes to 65.7 hours and reported energy-to-solution from 2.1 to 236.9 kWh. Analysis of the hardware track identified a Pareto front of optimal balance between time- and energy-to-solution, and revealed that on-package high-bandwidth memory (HBM) dominates single-node performance for next-generation CPUs. Software-track submissions achieved up to 28% lower energy per iteration, 17% higher maximum performance per node, and 72% shorter minimum time per iteration than the best hardware-track results, with full GPU ports and selective-memory optimisations leading the performance range. This manuscript describes the challenge organisation, the case setup and metrics, and presents the main findings from both tracks together with an outlook for future challenges.

Paper Structure

This paper contains 21 sections, 2 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Challenge case geometry and mesh.
  • Figure 2: Overview of the OHC-1 procedure.
  • Figure 3: Example meanCalc validation for a software-track submission. Top row: instantaneous force coefficients (blue) with running mean from iteration 1265 (orange) and reference value (green). Bottom row: convergence of $2\,\sigma(\mu)$ (blue) towards the acceptance threshold of 0.0015 (orange).
  • Figure 4: Overview of hardware-track submissions, (a) Time-to-solution over number of utilized CPU cores, (b) number of submissions based on mesh type and contributor ID, (c) number of submissions based on CPU vendor and CPU Generation.
  • Figure 5: Energy-to-solution vs. time-to-solution for hardware-track runs, faceted by mesh size. Operating regimes and the Pareto front of optimal energy--time balance are indicated.
  • ...and 9 more figures