Table of Contents
Fetching ...

Pyroclast: A Modular High-Performance Python Solver for Geodynamics

Marcel Ferrari

Abstract

This monograph presents the design, implementation, and evaluation of Pyroclast, a modular high-performance Python framework for large-scale geodynamic simulations. Pyroclast addresses limitations of legacy geodynamics solvers, often implemented in monolithic Fortran, C++, or C codebases with limited GPU support and extensibility, by combining modern numerical methods, hardware-accelerated execution, and a flexible object-oriented architecture. Designed for distributed and GPU-accelerated environments, Pyroclast provides an accessible and efficient platform for simulating mantle convection and lithospheric deformation using the marker-in-cell method and a matrix-free finite difference discretization. The work focuses on a scalable two-dimensional viscous mechanical solver that forms the computational core for future visco-elasto-plastic models. The solver includes a stress-conservative staggered grid discretization of the incompressible Stokes equations, a matrix-free geometric multigrid solver, Krylov and quasi-Newton methods, and MPI-based domain decomposition for distributed execution. Benchmarks evaluate performance and scalability. Shared-memory tests show strong scaling of the Stokes solver and demonstrate a 5-10x speedup on NVIDIA A100 GPUs compared to a multi-core CPU baseline. Distributed advection benchmarks show near-ideal weak scaling up to 896 CPU cores across seven compute nodes. These results demonstrate that Pyroclast achieves high performance while remaining accessible through a high-level Python interface. The framework also provides a blueprint for modernizing legacy geodynamics codes. Its modular architecture and Python-native implementation lower the barrier to entry while enabling interoperability with modern machine learning libraries, enabling hybrid physics-based and data-driven workflows.

Pyroclast: A Modular High-Performance Python Solver for Geodynamics

Abstract

This monograph presents the design, implementation, and evaluation of Pyroclast, a modular high-performance Python framework for large-scale geodynamic simulations. Pyroclast addresses limitations of legacy geodynamics solvers, often implemented in monolithic Fortran, C++, or C codebases with limited GPU support and extensibility, by combining modern numerical methods, hardware-accelerated execution, and a flexible object-oriented architecture. Designed for distributed and GPU-accelerated environments, Pyroclast provides an accessible and efficient platform for simulating mantle convection and lithospheric deformation using the marker-in-cell method and a matrix-free finite difference discretization. The work focuses on a scalable two-dimensional viscous mechanical solver that forms the computational core for future visco-elasto-plastic models. The solver includes a stress-conservative staggered grid discretization of the incompressible Stokes equations, a matrix-free geometric multigrid solver, Krylov and quasi-Newton methods, and MPI-based domain decomposition for distributed execution. Benchmarks evaluate performance and scalability. Shared-memory tests show strong scaling of the Stokes solver and demonstrate a 5-10x speedup on NVIDIA A100 GPUs compared to a multi-core CPU baseline. Distributed advection benchmarks show near-ideal weak scaling up to 896 CPU cores across seven compute nodes. These results demonstrate that Pyroclast achieves high performance while remaining accessible through a high-level Python interface. The framework also provides a blueprint for modernizing legacy geodynamics codes. Its modular architecture and Python-native implementation lower the barrier to entry while enabling interoperability with modern machine learning libraries, enabling hybrid physics-based and data-driven workflows.
Paper Structure (108 sections, 85 equations, 23 figures, 6 algorithms)

This paper contains 108 sections, 85 equations, 23 figures, 6 algorithms.

Figures (23)

  • Figure 1: High-level simulation loop used in the marker-in-cell method. Material properties such as viscosity and density are initialized on Lagrangian markers and interpolated to the Eulerian grid. The governing equations are solved on the grid, and the resulting properties (e.g., velocity, temperature) are interpolated back to the markers. The markers are then advected, and the loop continues until the maximum simulation time is reached.
  • Figure 2: Schematic of linear interpolation between markers (blue circles) and grid nodes (black squares) in two dimensions. The red marker is located inside a grid cell bounded by four nodes. Distances $r_x$ and $r_y$ are used to compute interpolation weights.
  • Figure 3: Staggered grid arrangement used for the finite-difference discretization of the incompressible Stokes equations. The velocity components $v_x$ (green triangles) and $v_y$ (blue diamonds) are defined at cell faces, while pressure $p$ (red circles) is stored at cell centers. Black squares mark the locations of basic nodes that carry auxiliary quantities such as viscosity and density. Boundary nodes (B) extend the computational domain to enforce boundary conditions. Ghost nodes (G) hold no physical information, but ensure that all quantities can be represented by arrays of size $n_y + 1 \times n_x + 1$, greatly simplifying stencil-based operations and interpolation. This layout ensures a stress-conservative discretization where all spatial derivatives are approximated using central differences. The figure is adapted from Gerya_stokes_flow.
  • Figure 4: Stencil for the discretization of the $x$‑momentum equation on a staggered grid. The figure shows the indexing of the primary solution variables ($v_x$, $v_y$, and $p$); all other quantities stored at the same nodal locations (e.g., $\sigma'_{xx}$, $\sigma'_{xy}$, $\eta_P$, $\eta_B$, $\rho$) follow the same indexing convention. Velocity components $v_x$ and $v_y$ are defined on cell faces, pressure $p$ at cell centers, and the deviatoric stress components $\sigma'_{xx}$ and $\sigma'_{xy}$ at their respective natural positions. Viscosity is stored at the same nodes as the corresponding stress components ($\eta_P$ at pressure nodes and $\eta_B$ at basic nodes), ensuring a consistent stress‑conservative discretization. The figure is adapted from gerya_numerical_solution_of_stokesferrari_3d_blocking.
  • Figure 5: Illustration of information propagation using a local 5-point stencil (in blue) on a structured grid. The red diamond marks a source of information in the top-left corner. Each stencil application updates a node based on its immediate North, South, East, and West neighbors. In Jacobi iteration, all updates use values from the previous sweep, meaning information can only advance by one grid edge per iteration (red arrows). Propagating information across the domain therefore requires a number of sweeps proportional to the grid diameter. Gauss-Seidel iteration improves on this by immediately using updated values, allowing information to chain through the grid within a single sweep. However, this chaining leads to gradual contamination of the signal, and the method is inherently serial, making it unsuitable for parallel execution. Red-Black Gauss-Seidel (RBGS) enables parallelism by splitting the grid and processing it in two alternating sweeps, but this also limits information flow. As a result, RBGS behaves similarly to Jacobi, requiring approximately half as many sweeps (i.e., diameter $/~2$) to propagate information across the grid.
  • ...and 18 more figures