In-Memory Load Balancing for Discontinuous Galerkin Methods on Polytopal Meshes
Patrick Kopper, Anna Schwarz, Jens Keim, Andrea Beck
TL;DR
This paper tackles workload imbalance in high-order discontinuous Galerkin simulations on heterogeneous polytopal meshes caused by modal time stepping. It introduces a lightweight, in-memory load balancing approach that reassigns elements along a space-filling Hilbert curve by leveraging high-precision runtime measurements, implemented within the FLEXI DG framework. The authors demonstrate that this strategy recovers a substantial portion of lost efficiency on mixed-element meshes and preserves strong and weak scaling on both single nodes and large-scale HPC systems, including MareNostrum 5. By providing a system-agnostic, low-overhead mechanism, the work broadens the applicability of FLEXI to complex geometries and large-scale simulations with mixed element types.
Abstract
High-order accurate discontinuous Galerkin (DG) methods have emerged as powerful tools for solving partial differential equations such as the compressible Navier-Stokes equations due to their excellent dispersion-dissipation properties and scalability on modern hardware. The open-source DG framework FLEXI has recently been extended to support DG schemes on general polytopal elements including tetrahedra, prisms, and pyramids. This advancement enables simulations on complex geometries where purely hexahedral meshes are difficult or impossible to generate. However, the use of meshes with heterogeneous element types introduces a workload imbalance, a consequence of the temporal evolution of modal rather than nodal degrees of freedom and the accompanying transformations. In this work, we present a lightweight, system-agnostic in-memory load balancing strategy designed for high-order DG solvers. The method employs high-precision runtime measurements and efficient data redistribution to dynamically reassign mesh elements along a space-filling curve. We demonstrate the effectiveness of the approach through simulations of the Taylor-Green vortex and large-scale parallel runs on the EuroHPC pre-exascale system MareNostrum 5. Results show that the proposed strategy recovers a significant fraction of the lost efficiency on heterogeneous meshes while retaining excellent strong and weak scaling.
