Table of Contents
Fetching ...

SCALE-TRACK: Asynchronous Euler-Lagrange particle tracking on heterogeneous computing architecture

Silvio Schmalfuß, Sergey Lesnik, Henrik Rusche, Dennis Niedermeier

Abstract

Euler-Lagrange (EL) simulations provide a direct and robust framework for modeling disperse multiphase flows. However, they are computationally expensive. While various approaches have attempted to leverage heterogeneous computing architectures, they have encountered scalability limitations. We present SCALE-TRACK, a scalable two-way coupled EL particle tracking algorithm, designed to exploit heterogeneous exascale computing environments. With asynchronous coupling, cache-friendly data structures, and chunk-based partitioning, we address key limitations of existing EL implementations. Validations against an analytical solution and a conventional EL implementation demonstrate the accuracy of the proposed algorithms. On a local workstation, we simulated 1.4 billion particles in a test case featuring a single graphics processing unit (GPU). Scaling runs on an HPC (high-performance computing) cluster show excellent strong and weak scaling, with up to 256 billion particles being tracked on up to 256 GPUs. This represents a significant advancement for EL simulations, enabling high-fidelity simulations on local workstations and pushing the limits on HPC systems. The software is released as open source and is publicly available.

SCALE-TRACK: Asynchronous Euler-Lagrange particle tracking on heterogeneous computing architecture

Abstract

Euler-Lagrange (EL) simulations provide a direct and robust framework for modeling disperse multiphase flows. However, they are computationally expensive. While various approaches have attempted to leverage heterogeneous computing architectures, they have encountered scalability limitations. We present SCALE-TRACK, a scalable two-way coupled EL particle tracking algorithm, designed to exploit heterogeneous exascale computing environments. With asynchronous coupling, cache-friendly data structures, and chunk-based partitioning, we address key limitations of existing EL implementations. Validations against an analytical solution and a conventional EL implementation demonstrate the accuracy of the proposed algorithms. On a local workstation, we simulated 1.4 billion particles in a test case featuring a single graphics processing unit (GPU). Scaling runs on an HPC (high-performance computing) cluster show excellent strong and weak scaling, with up to 256 billion particles being tracked on up to 256 GPUs. This represents a significant advancement for EL simulations, enabling high-fidelity simulations on local workstations and pushing the limits on HPC systems. The software is released as open source and is publicly available.

Paper Structure

This paper contains 14 sections, 18 equations, 8 figures.

Figures (8)

  • Figure 1: Domain decomposition strategies, showing Eulerian partitions via colors and Lagrangian partitions L1 to L4 via different hatching styles. a) Eulerian and Lagrangian domains are identical (the conventional and most common approach). b) Lagrangian partitions do not necessarily coincide with Eulerian ones. Additionally, they may move, grow and shrink. c) The same as in b), but Lagrangian partitions can overlap.
  • Figure 2: A schematic of SCALE-TRACK's execution timeline with an exemplary setup comprising two GPUs A and B, two primary CPU cores (A and D), and two secondary CPU cores (B and C). Blue blocks are the actual computation routines, light green blocks are transfers between CPUs and GPUs, dark green blocks stand for MPI communication, blue arrows are OpenFOAM communications. Block sizes do not correspond to computational time. (T: thread; bb: bounding box).
  • Figure 3: Errors of the momentum relative to an analytical solution for the conventional EL approach and different solution strategies for asynchronous coupling. Top: Eulerian phase; Bottom: Lagrangian phase.
  • Figure 4: Schematic of a convection cloud chamber: a buoyancy-driven flow is induced by a combination of cool (blue, $T_{cool} =$ 283) and a warm (red, $T_{warm} =$ 293) boundaries. The walls at the front (cold) and at the right (warm) are removed for illustration purposes. All walls are water-saturated. Droplets, as indicated by the spheres (not to scale) move through the domain and grow and shrink depending on the humidity they encounter.
  • Figure 5: Snapshots of a cloud chamber simulation. The whole simulation domain is shown on the left hand side, with the two warm side walls and the cold top wall removed for visibility, and a random selection of droplets. The black cuboid on the left hand side is zoomed in on the right hand side, showing a more detailed view of the droplets. The droplets are scaled according to their diameter, but magnified.
  • ...and 3 more figures