Table of Contents
Fetching ...

LEDDS: Portable LBM-DEM simulations on GPUs

Raphael Maggio-Aprile, Maxime Rambosson, Christophe Coreixas, Jonas Latt

TL;DR

LEDDS tackles the challenge of portable, high-performance CFD-DEM simulations on GPUs by expressing all computations as a small set of algorithmic primitives, enabling device-agnostic yet efficient execution. The framework couples LBM and DEM through a partially saturated cell approach (PSC) and implements both spherical and ellipsoidal particles on a uniform grid, with DEM and fluid steps unified under map, sort, and reduce primitives. Validation spans DEM-only and LBM-DEM benchmarks, showing energy and momentum conservation, correct Jeffery rotations for ellipsoids, and accurate rheology predictions, while performance analyses demonstrate strong GPU speedups and competitive parity with optimized CUDA-based solvers. Collectively, LEDDS provides a blueprint for portable, readable, and scalable multiphysics software on heterogeneous architectures, with clear paths toward distributed multi-GPU implementations and more complex particle geometries.

Abstract

Algorithmic formulations of GPU programs provide a high-level alternative to device-specific code by expressing computations as compositions of well-defined parallel primitives (e.g., map, sort, reduce), rather than through handcrafted GPU kernels. In this work, we demonstrate that this paradigm can be extended to complex and challenging problems in computational physics: the simulation of granular flows and fluid-particle interactions. LEDDS, our open-source framework, performs fully coupled Lattice Boltzmann -- Discrete Element Method (LBM-DEM) simulations using only algorithmic primitives, and runs efficiently on single-GPU platforms. The entire workflow, including neighbor search, collision detection, and fluid-particle coupling, is expressed as a sequence of portable primitives. While the current implementation illustrates these principles primarily through algorithms from the C++ Standard Library, with selective use of Thrust primitives for performance, the underlying concept is compatible with any HPC environment offering a rich set of parallel algorithms and is therefore applicable across a wide range of modern GPU systems and future accelerators. LEDDS is validated through benchmarks spanning both DEM and LBM-DEM configurations, including sphere and ellipsoid collisions, wall friction tests, single-particle settling, Jeffery's orbits, and particle-laden shear flows. Despite its high level of abstraction, LEDDS achieves performances comparable to those of hand-tuned CUDA solvers, while maintaining portability and code clarity. These results show that high-performance LBM-DEM coupling can be achieved without sacrificing generality or readability, establishing LEDDS as a blueprint for portable multiphysics frameworks based on algorithmic primitives.

LEDDS: Portable LBM-DEM simulations on GPUs

TL;DR

LEDDS tackles the challenge of portable, high-performance CFD-DEM simulations on GPUs by expressing all computations as a small set of algorithmic primitives, enabling device-agnostic yet efficient execution. The framework couples LBM and DEM through a partially saturated cell approach (PSC) and implements both spherical and ellipsoidal particles on a uniform grid, with DEM and fluid steps unified under map, sort, and reduce primitives. Validation spans DEM-only and LBM-DEM benchmarks, showing energy and momentum conservation, correct Jeffery rotations for ellipsoids, and accurate rheology predictions, while performance analyses demonstrate strong GPU speedups and competitive parity with optimized CUDA-based solvers. Collectively, LEDDS provides a blueprint for portable, readable, and scalable multiphysics software on heterogeneous architectures, with clear paths toward distributed multi-GPU implementations and more complex particle geometries.

Abstract

Algorithmic formulations of GPU programs provide a high-level alternative to device-specific code by expressing computations as compositions of well-defined parallel primitives (e.g., map, sort, reduce), rather than through handcrafted GPU kernels. In this work, we demonstrate that this paradigm can be extended to complex and challenging problems in computational physics: the simulation of granular flows and fluid-particle interactions. LEDDS, our open-source framework, performs fully coupled Lattice Boltzmann -- Discrete Element Method (LBM-DEM) simulations using only algorithmic primitives, and runs efficiently on single-GPU platforms. The entire workflow, including neighbor search, collision detection, and fluid-particle coupling, is expressed as a sequence of portable primitives. While the current implementation illustrates these principles primarily through algorithms from the C++ Standard Library, with selective use of Thrust primitives for performance, the underlying concept is compatible with any HPC environment offering a rich set of parallel algorithms and is therefore applicable across a wide range of modern GPU systems and future accelerators. LEDDS is validated through benchmarks spanning both DEM and LBM-DEM configurations, including sphere and ellipsoid collisions, wall friction tests, single-particle settling, Jeffery's orbits, and particle-laden shear flows. Despite its high level of abstraction, LEDDS achieves performances comparable to those of hand-tuned CUDA solvers, while maintaining portability and code clarity. These results show that high-performance LBM-DEM coupling can be achieved without sacrificing generality or readability, establishing LEDDS as a blueprint for portable multiphysics frameworks based on algorithmic primitives.

Paper Structure

This paper contains 32 sections, 62 equations, 23 figures, 2 tables.

Figures (23)

  • Figure 1: 2D schematic of the coupled particle-fluid simulation. Bold lines indicate the DEM (cell-linked list) grid used for collision detection, thin lines show the LBM fluid grid, and semi-transparent circles show the particles. The relative 1:3 spacing between the grids is for illustration; in practice, the ratio depends on the largest particle size. The algorithm requires the grid spacing to be larger than the particle diameter to guarantee an overlap of a particle with at most 8 cells (in 3D).
  • Figure 2: Intersection plane which characterizes a collision between two spheres. The tangential force is integrated in time along the direction $\mathbf{e}_{t_{ij}}$.
  • Figure 3: Overview of the LEDDS workflow from the particle perspective. Colored groups highlight the three main stages: particle/grid updates (blue), DEM force computation (red), and fluid-particle coupling (green), corresponding to Sections \ref{['subsubsec:grid']}, \ref{['subsubsec:DEMforces']}, and \ref{['subsubsec:coupling']}, respectively. In case of pure DEM, the third stage disappears from the workflow.
  • Figure 4: Illustration of the 2D uniform grid. Spheres $i$ and $j$ overlap the cells $(2,2)$ and $(2,3)$. Sphere $i$ is considered present in cells $(0,1)$ and $(0,3)$ due to the use of the AABB.
  • Figure 5: Three-step procedure for the global computation of all unique collision pairs, illustrated with an example (solids are represented by letters instead of numbers for better readability). Step (a) and (b) are local per-cell operations. Step (c) sorts the collision pairs and makes them unique in a global operation.
  • ...and 18 more figures