Table of Contents
Fetching ...

Scalable Domain-decomposed Monte Carlo Neutral Transport for Nuclear Fusion

Oskar Lappi, Huw Leggate, Yannick Marandet, Jan Åström, Keijo Heljanko, Dmitriy V. Borodin

TL;DR

This work tackles memory bottlenecks in large-scale Monte Carlo neutral transport by introducing a domain-decomposed Monte Carlo (DDMC) approach implemented in the open-source code Eiron. Three parallel strategies are compared—domain replication, shared memory, and asynchronous DDMC—demonstrating that DDMC offers superior scalability, including superlinear strong scaling on cache-sensitive hardware and meaningful weak scaling up to 16384 cores. The results show DDMC can dramatically improve memory efficiency and performance for grid resolutions that exceed a single node's memory, enabling simulations of previously infeasible, high-resolution, Larmor-scale fusion edge turbulence. The study suggests integrating DDMC into EIRENE to expand the feasible design envelope for next-generation devices like ITER and to motivate GPU porting for further acceleration.

Abstract

EIRENE [1] is a Monte Carlo neutral transport solver heavily used in the fusion community. EIRENE does not implement domain decomposition, making it impossible to use for simulations where the grid data does not fit on one compute node (see e.g. [2]). This paper presents a domain-decomposed Monte Carlo (DDMC) algorithm implemented in a new open source Monte Carlo code, Eiron. Two parallel algorithms currently used in EIRENE are also implemented in Eiron, and the three algorithms are compared by running strong scaling tests, with DDMC performing better than the other two algorithms in nearly all cases. On the supercomputer Mahti [3], DDMC strong scaling is superlinear for grids that do not fit into an L3 cache slice (4 MiB). The DDMC algorithm is also scaled up to 16384 cores in weak scaling tests, with a weak scaling efficiency of 45% in a high-collisional (heavier compute load) case, and 26% in a low-collisional (lighter compute load) case. We conclude that implementing this domain decomposition algorithm in EIRENE would improve performance and enable simulations that are currently impossible due to memory constraints.

Scalable Domain-decomposed Monte Carlo Neutral Transport for Nuclear Fusion

TL;DR

This work tackles memory bottlenecks in large-scale Monte Carlo neutral transport by introducing a domain-decomposed Monte Carlo (DDMC) approach implemented in the open-source code Eiron. Three parallel strategies are compared—domain replication, shared memory, and asynchronous DDMC—demonstrating that DDMC offers superior scalability, including superlinear strong scaling on cache-sensitive hardware and meaningful weak scaling up to 16384 cores. The results show DDMC can dramatically improve memory efficiency and performance for grid resolutions that exceed a single node's memory, enabling simulations of previously infeasible, high-resolution, Larmor-scale fusion edge turbulence. The study suggests integrating DDMC into EIRENE to expand the feasible design envelope for next-generation devices like ITER and to motivate GPU porting for further acceleration.

Abstract

EIRENE [1] is a Monte Carlo neutral transport solver heavily used in the fusion community. EIRENE does not implement domain decomposition, making it impossible to use for simulations where the grid data does not fit on one compute node (see e.g. [2]). This paper presents a domain-decomposed Monte Carlo (DDMC) algorithm implemented in a new open source Monte Carlo code, Eiron. Two parallel algorithms currently used in EIRENE are also implemented in Eiron, and the three algorithms are compared by running strong scaling tests, with DDMC performing better than the other two algorithms in nearly all cases. On the supercomputer Mahti [3], DDMC strong scaling is superlinear for grids that do not fit into an L3 cache slice (4 MiB). The DDMC algorithm is also scaled up to 16384 cores in weak scaling tests, with a weak scaling efficiency of 45% in a high-collisional (heavier compute load) case, and 26% in a low-collisional (lighter compute load) case. We conclude that implementing this domain decomposition algorithm in EIRENE would improve performance and enable simulations that are currently impossible due to memory constraints.

Paper Structure

This paper contains 20 sections, 9 equations, 3 figures, 1 table, 5 algorithms.

Figures (3)

  • Figure 1: Log-log plots of the strong scaling efficiency for all three parallel algorithms and all six grid resolutions in the low-collisional case (a) and the high-collisional case (b). The strong scaling efficiency at $n$ CPU cores is defined as $t_1/(n\cdot t_n)$, where $t_m$ is the wallclock time when running with $m$ CPU cores. Higher is better.
  • Figure 2: Log-log plots of weak scaling efficiency for Algorithm \ref{['alg:sim_mpi']} (DDMC) with $256^2$ subdomain resolution. The weak scaling efficiency at $n$ CPU cores is defined as $t_1/t_n$, where $t_m$ is the wallclock time when running with $m$ CPU cores. Periodic boundaries and an area source covering the entire grid were used to make sure that the work per core is constant. (a) is the low-collisional case, (b) is the high-collisional case. Higher is better.
  • Figure 3: Log-log plot of particle procesing rate as a function of subdomain resolution using the DDMC strong scaling data. Higher is better. We have left out the non-square results for easier comparison to Table \ref{['tab:cache_model']}, and to remove subdomain shape as a potential confounding factor.