Table of Contents
Fetching ...

Performance and scaling of the LFRic weather and climate model on different generations of HPE Cray EX supercomputers

J. Mark Bull, Andrew Coughtrie, Deva Deeptimahanti, Mark Hedley, Caoimhín Laoide-Kemp, Christopher Maynard, Harry Shepherd, Sebastiaan van de Bund, Michèle Weiland, Benjamin Went

TL;DR

This study evaluates the performance and scaling of the LFRic weather and climate model, focusing on the GungHo dynamical core, across multiple generations of HPE Cray EX systems. It uses a DSL (PSyclone) to generate parallel code and examines strong/weak scaling, OpenMP/MPI configurations, and I/O behavior with XIOS, YAXT, and Lustre storage. Key findings show solid scalability up to large node counts, with global-sum operations limiting growth and modest gains from multi-threading depending on configuration; I/O performance emerges as a dominant factor under high diagnostic loads, with substantial improvements achievable through configuration and storage optimisations. The work highlights practical optimization opportunities (vectorisation, data locality, halo overlap, and I/O tuning) to enhance performance on current Cray EX platforms and informs future work toward Exascale readiness.

Abstract

This study presents scaling results and a performance analysis across different supercomputers and compilers for the Met Office weather and climate model, LFRic. The model is shown to scale to large numbers of nodes which meets the design criteria, that of exploitation of parallelism to achieve good scaling. The model is written in a Domain-Specific Language, embedded in modern Fortran and uses a Domain-Specific Compiler, PSyclone, to generate the parallel code. The performance analysis shows the effect of choice of algorithm, such as redundant computation and scaling with OpenMP threads. The analysis can be used to motivate a discussion of future work to improve the OpenMP performance of other parts of the code. Finally, an analysis of the performance tuning of the I/O server, XIOS is presented.

Performance and scaling of the LFRic weather and climate model on different generations of HPE Cray EX supercomputers

TL;DR

This study evaluates the performance and scaling of the LFRic weather and climate model, focusing on the GungHo dynamical core, across multiple generations of HPE Cray EX systems. It uses a DSL (PSyclone) to generate parallel code and examines strong/weak scaling, OpenMP/MPI configurations, and I/O behavior with XIOS, YAXT, and Lustre storage. Key findings show solid scalability up to large node counts, with global-sum operations limiting growth and modest gains from multi-threading depending on configuration; I/O performance emerges as a dominant factor under high diagnostic loads, with substantial improvements achievable through configuration and storage optimisations. The work highlights practical optimization opportunities (vectorisation, data locality, halo overlap, and I/O tuning) to enhance performance on current Cray EX platforms and informs future work toward Exascale readiness.

Abstract

This study presents scaling results and a performance analysis across different supercomputers and compilers for the Met Office weather and climate model, LFRic. The model is shown to scale to large numbers of nodes which meets the design criteria, that of exploitation of parallelism to achieve good scaling. The model is written in a Domain-Specific Language, embedded in modern Fortran and uses a Domain-Specific Compiler, PSyclone, to generate the parallel code. The performance analysis shows the effect of choice of algorithm, such as redundant computation and scaling with OpenMP threads. The analysis can be used to motivate a discussion of future work to improve the OpenMP performance of other parts of the code. Finally, an analysis of the performance tuning of the I/O server, XIOS is presented.
Paper Structure (21 sections, 10 figures, 2 tables)

This paper contains 21 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Strong scaling behaviour of GungHo for different mesh sizes on ARCHER2 (top row) and Setonix (bottom row) using Cray compilers (left column) and GNU compilers (right column).
  • Figure 2: Weak scaling and threading performance of GungHo for a local areas of size 256 grid cells per core (top row) and 128 grid cells per core (bottom row).
  • Figure 3: Breakdown of execution time of GungHo for a C512 mesh on 48 nodes, on ARCHER2 (left) and Setonix (right) using the Cray compiler and GNU compiler.
  • Figure 4: Breakdown of execution time of GungHo on ARCHER2 for a fixed local area of 256 grid cells, on ARCHER2 (left) and Setonix (right) using the Cray compiler and GNU compiler.
  • Figure 5: Ratio of execution time of GungHo using the Cray compiler to that using the GNU compiler on ARCHER2 (top row) and Setonix (bottom row).
  • ...and 5 more figures