Regent based parallel meshfree LSKUM solver for heterogenous HPC platforms
Sanath Salil, Nischay Ram Mamidi, Anil Nemili, Elliott Slaughter
TL;DR
This work develops a Regent-based meshfree LSKUM solver for $2D$ inviscid Euler equations on heterogeneous HPC platforms and benchmarks it against CUDA-C and Fortran+MPI implementations.It introduces a GPU-accelerated Regent path with implicit data movement and a CPU-parallel path based on METIS and Legion, verified on the NACA 0012 test cases.Results show Regent achieves portable performance and competitive CPU scaling, but GPU speedups lag CUDA-C due to lower SM utilisation and higher warp stalls; kernel-level optimizations such as splitting the flux_residual kernel help narrow the gap.The findings support Regent as a viable, maintainable route for cross-architecture CFD codes, with future work targeting $3D$ extensions and broader hardware support.
Abstract
Regent is an implicitly parallel programming language that allows the development of a single codebase for heterogeneous platforms targeting CPUs and GPUs. This paper presents the development of a parallel meshfree solver in Regent for two-dimensional inviscid compressible flows. The meshfree solver is based on the least squares kinetic upwind method. Example codes are presented to show the difference between the Regent and CUDA-C implementations of the meshfree solver on a GPU node. For CPU parallel computations, details are presented on how the data communication and synchronisation are handled by Regent and Fortran+MPI codes. The Regent solver is verified by applying it to the standard test cases for inviscid flows. Benchmark simulations are performed on coarse to very fine point distributions to assess the solver's performance. The computational efficiency of the Regent solver on an A100 GPU is compared with an equivalent meshfree solver written in CUDA-C. The codes are then profiled to investigate the differences in their performance. The performance of the Regent solver on CPU cores is compared with an equivalent explicitly parallel Fortran meshfree solver based on MPI. Scalability results are shown to offer insights into performance.
