Memory- and compute-optimized geometric multigrid GMGPolar for curvilinear coordinate representations -- Applications to fusion plasma
Julian Litz, Philippe Leleux, Carola Kruse, Joscha Gedicke, Martin J. Kühn
TL;DR
This work tackles efficient solution of the gyrokinetic Poisson equation on curvilinear tokamak cross sections using a matrix-free geometric multigrid GMGPolar. It introduces a fully refactored, object-oriented GMGPolar with two matrix-free implementations (Give and Take), higher-order implicit extrapolation, and FMG/W/F-cycle features to achieve linear complexity and elevated accuracy. The numerical results demonstrate substantial solver speedups (up to 16–18× for the solver, 25–37× as a preconditioner) and reduced memory footprints across multiple geometries, confirming strong and weak scalability. The approach enables fast, memory-efficient high-fidelity plasma simulations on HPC systems, with potential GPU porting and multi-patch extensions for more complex tokamak geometries.
Abstract
Tokamak fusion reactors are actively studied as a means of realizing energy production from plasma fusion. However, due to the substantial cost and time required to construct fusion reactors and run physical experiments, numerical experiments are indispensable for understanding plasma physics inside tokamaks, supporting the design and engineering phase, and optimizing future reactor designs. Geometric multigrid methods are optimal solvers for many problems that arise from the discretization of partial differential equations. It has been shown that the multigrid solver GMGPolar solves the 2D gyrokinetic Poisson equation in linear complexity and with only small memory requirements compared to other state-of-the-art solvers. In this paper, we present a completely refactored and object-oriented version of GMGPolar which offers two different matrix-free implementations. Among other things, we leverage the Sherman-Morrison formula to solve cyclic tridiagonal systems from circular line solvers without additional fill-in and we apply reordering to optimize cache access of circular and radial smoothing operations. With the Give approach, memory requirements are further reduced and speedups of four to seven are obtained for usual test cases. For the Take approach, speedups of 16 to 18 can be attained. In an additionally experimental setup of using GMGPolar as a preconditioner for conjugate gradients, this speedup could even be increased to factors between 25 and 37.
