Scalable Domain-decomposed Monte Carlo Neutral Transport for Nuclear Fusion
Oskar Lappi, Huw Leggate, Yannick Marandet, Jan Åström, Keijo Heljanko, Dmitriy V. Borodin
TL;DR
This work tackles memory bottlenecks in large-scale Monte Carlo neutral transport by introducing a domain-decomposed Monte Carlo (DDMC) approach implemented in the open-source code Eiron. Three parallel strategies are compared—domain replication, shared memory, and asynchronous DDMC—demonstrating that DDMC offers superior scalability, including superlinear strong scaling on cache-sensitive hardware and meaningful weak scaling up to 16384 cores. The results show DDMC can dramatically improve memory efficiency and performance for grid resolutions that exceed a single node's memory, enabling simulations of previously infeasible, high-resolution, Larmor-scale fusion edge turbulence. The study suggests integrating DDMC into EIRENE to expand the feasible design envelope for next-generation devices like ITER and to motivate GPU porting for further acceleration.
Abstract
EIRENE [1] is a Monte Carlo neutral transport solver heavily used in the fusion community. EIRENE does not implement domain decomposition, making it impossible to use for simulations where the grid data does not fit on one compute node (see e.g. [2]). This paper presents a domain-decomposed Monte Carlo (DDMC) algorithm implemented in a new open source Monte Carlo code, Eiron. Two parallel algorithms currently used in EIRENE are also implemented in Eiron, and the three algorithms are compared by running strong scaling tests, with DDMC performing better than the other two algorithms in nearly all cases. On the supercomputer Mahti [3], DDMC strong scaling is superlinear for grids that do not fit into an L3 cache slice (4 MiB). The DDMC algorithm is also scaled up to 16384 cores in weak scaling tests, with a weak scaling efficiency of 45% in a high-collisional (heavier compute load) case, and 26% in a low-collisional (lighter compute load) case. We conclude that implementing this domain decomposition algorithm in EIRENE would improve performance and enable simulations that are currently impossible due to memory constraints.
