Integrating Odeint Time Stepping into OpenFPM for Distributed and GPU Accelerated Numerical Solvers

Abhinav Singh; Landfried Kraatz; Serhii Yaskovets; Pietro Incardona; Ivo F. Sbalzarini

Integrating Odeint Time Stepping into OpenFPM for Distributed and GPU Accelerated Numerical Solvers

Abhinav Singh, Landfried Kraatz, Serhii Yaskovets, Pietro Incardona, Ivo F. Sbalzarini

Abstract

We present a software implementation integrating the time-integration library Odeint from Boost with the OpenFPM framework for scalable scientific computing. This enables compact and scalable codes for multi-stage, multi-step, and adaptive explicit time integration on distributed-memory parallel computers and on Graphics Processing Units (GPUs). The present implementation is based on extending OpenFPM's metaprogramming system to Odeint data types. This makes the time-integration methods from Odeint available in a concise template-expression language for numerical simulations distributed and parallelized using OpenFPM. We benchmark the present software for exponential and sigmoidal dynamics and present application examples to the 3D Gray-Scott reaction-diffusion problem and the "dam break" problem from fluid mechanics. We find a strong-scaling efficiency of 80% on up to 512 CPU cores and a five-fold speedup on a single GPU.

Integrating Odeint Time Stepping into OpenFPM for Distributed and GPU Accelerated Numerical Solvers

Abstract

Paper Structure (23 sections, 5 equations, 3 figures, 2 tables)

This paper contains 23 sections, 5 equations, 3 figures, 2 tables.

Paper Authors
Paper Author Roles and Affiliations
Abstract
Keywords
(1) Overview
Introduction
Implementation and architecture
Quality control
(2) Availability
Operating systems
Programming languages
Minimal hardware requirements
Software dependencies
List of code contributors
Software location:
...and 4 more sections

Figures (3)

Figure 1: Architecture of the Odeint Library. In each iteration, the state$\mathbf{u}$ is advanced from a time $t_1$ to a later time $t_2=t_1+\delta t$, $\delta t >0$. The right-hand side $\mathcal{F}(t,\mathbf{u}(t))$ of the ODE system is encapsulated in the System functor. An Algebra defines mathematical operations over the state-type, i.e., the data type of the state. An optional Observer functor can execute user code at the beginning of every time step.
Figure 2: Numerical convergence of time-integration schemes with increasing numbers of time steps. (a,b)$L_\infty (\blacklozenge)$ and $L_2 (\bullet)$ error norms for the exponential dynamics of Eq. (\ref{['eq:exp']}) (a) and the sigmoidal dynamics of Eq. (\ref{['eq:sig']}) (b) with different one-step multi-stage methods (colors, inset legend). Solid lines indicate the theoretically expected slopes. (c,d)$L_\infty (\blacklozenge)$ error norms for the exponential (c) and sigmoidal (d) dynamics solved using Adams--Bashforth with different numbers of steps ($1\ldots 8$, colors, inset legend). Solid lines indicate the theoretically expected slopes.
Figure 3: Strong scaling of the OpenFPM+Odeint time integration schemes with increasing numbers of CPU cores and on a GPU. (a) Average wall-clock times in seconds over three independent runs of solving the exponential dynamics from Eq. (\ref{['eq:exp']}) using different one-step multi-stage methods (colors, inset legend, error bars below symbol size). The solid black line indicates the optimal speed-up. (b) Average wall-clock times in seconds over 3 independent runs for the 3D Gray-Scott Eq. (\ref{['eq:gs']}) using different one-step multi-stage methods (colors, inset legend, error bars below symbol size). The solid black line indicates the speed-up for 80% parallel efficiency. (c) Average wall-clock speedups for the SPH dam break case, normalized to the runtimes reported for the reference CPU implementation without Odeint from INCARDONA2019155 (=1.0). All results are averaged over three independent runs (error bars show standard deviation) for different numbers of particles (bar groups). Speedups are given for a single Nvidia GeForce RTX 4090 GPU (blue bars), compared with running the code without the present Odeint interface on the same GPU (orange bars), and running and OpenFPM+Odeint code on 32 cores of an AMD Ryzen Threadripper 3990X CPU (green bars). (d) Visualization of the fluid, colored by velocity magnitude (color bar), of the SPH dam break case with 1.2 million particles at time 0.43 s. The OpenFPM domain decomposition onto four processes is shown in the inset figure (one color per process subdomain).

Integrating Odeint Time Stepping into OpenFPM for Distributed and GPU Accelerated Numerical Solvers

Abstract

Integrating Odeint Time Stepping into OpenFPM for Distributed and GPU Accelerated Numerical Solvers

Abstract

Table of Contents

Figures (3)