Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations

David A. Ham; Vaclav Hapla; Matthew G. Knepley; Lawrence Mitchell; Koki Sagiyama

Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations

David A. Ham, Vaclav Hapla, Matthew G. Knepley, Lawrence Mitchell, Koki Sagiyama

TL;DR

This work develops an efficient N-to-M checkpointing algorithm for finite element simulations, enabling saving and loading of meshes, function spaces, and functions across different parallel process counts. By representing inter-process mappings as star forests and implementing the workflow in PETSc and Firedrake (via a new CheckpointFile API), the approach reconstructs functions on loaded meshes regardless of arbitrary redistributions. The method is validated for correctness across multiple FE families and dimensions, and scales on ARCHER2 up to billions of DoFs, with detailed I/O performance analysis. The practical impact is enabling flexible, multi-session and post-processing workflows for large-scale FEM simulations without constraining save/load process counts. Future work aims to ensure loaded meshes inherit global numbering to preserve exact loading distributions for repeated checkpointing.

Abstract

In this work, we introduce a new algorithm for N-to-M checkpointing in finite element simulations. This new algorithm allows efficient saving/loading of functions representing physical quantities associated with the mesh representing the physical domain. Specifically, the algorithm allows for using different numbers of parallel processes for saving and loading, allowing for restarting and post-processing on the process count appropriate to the given phase of the simulation and other conditions. For demonstration, we implemented this algorithm in PETSc, the Portable, Extensible Toolkit for Scientific Computation, and added a convenient high-level interface into Firedrake, a system for solving partial differential equations using finite element methods. We evaluated our new implementation by saving and loading data involving 8.2 billion finite element degrees of freedom using 8,192 parallel processes on ARCHER2, the UK National Supercomputing Service.

Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations

TL;DR

Abstract

Paper Structure (31 sections, 28 equations, 11 figures, 5 tables)

This paper contains 31 sections, 28 equations, 11 figures, 5 tables.

Introduction
Saving and loading meshes, function spaces, and functions
Meshes
Saving mesh topology
Loading mesh topology
Function spaces and functions
Local DoF vector on the mesh topology before saving
Local discrete function space data on the mesh topology before saving
Saving global DoF vectors
Saving global discrete function space data
Loading global discrete function space data
Loading global DoF vectors
Notes
Reconstructing function spaces and functions on the loaded mesh
Local discrete function space data on the loaded mesh topology
...and 16 more sections

Figures (11)

Figure 1: An example of cone-preserving mesh distribution. Cone orderings are indicated by the arrows. \ref{['Fi:mesh_global']} Global numbering of the original topology. \ref{['Fi:mesh_saved']} Distributed topology in local numbering before saving. Entities owned by processes 0 and 1 are enclosed by dotted lines filled with white and blue, respectively. The topology from \ref{['Fi:mesh_global']} has been distributed over two processes: 0 (left) and 1 (right). On each process, entities are relabeled with local numbers so that each local number is associated with one global number. The ordering of the cone of each entity is preserved. Cones of all entities in global numbering are saved. \ref{['Fi:mesh_loaded']} Loaded and redistributed topology. Entities owned by processes 0, 1, and 2 are enclosed by dotted lines filled with white, blue, and orange, respectively. The topology data are loaded in parallel and redistributed with some partition overlap over three processes. On each process, entities are relabeled with local numbers. The orderings of the cones are preserved in the save-load cycle as indicated by the preserved directions of arrows when compared to \ref{['Fi:mesh_saved']}. \ref{['Fi:mesh_loaded_num']} Array view of the topology in \ref{['Fi:mesh_loaded']}. The set of local numbers ${I}_T^{m}$ and the arrays containing the associated global numbers $\textsf{LocG}_T^{m}$ for process $m\in\{0,1,2\}$. Parts owned by processes 0, 1, and 2 are shown in white, blue, and orange, respectively. \newlabelFi:mesh0
Figure 1: DAG representation of the parallel mesh topology shown in \ref{['Fi:mesh_loaded']} (repeated in \ref{['Fi:mesh_loaded_copy']}) that one can represent with a parallel DMPlex. Local numbers are shown for local topologies on processes 0, 1, and 2, from left to right. Entities owned by processes 0, 1, and 2 are shown in white, blue, and orange. The cell with local number 0 and its cone on process 1 are highlighted. \newlabelFi:dag0
Figure 1: Orientation with respect to the FIAT reference element. Cone orientations are denoted with arrows. \ref{['Fi:orientation_fiat_cones']} (Left) the reference element, (right) an example cell taken from \ref{['Fi:function_dofs_loaded']}, and (middle) the example cell rotated once counter-clockwise to match the reference triangle. \ref{['Fi:orientation_fiat_dofs']} DoF layouts associated with the P4 finite element in (left) the FIAT reference cell, (right) the example cell, and (middle) the example cell rotated once counter-clockwise. \newlabelFi:orientation_fiat0
Figure 2: A schematic of the map $\chi_{{I}_T}^{{L}_P}$ from the local numbers of the loaded mesh topology, ${I}_T$, to the global numbers, ${L}_P$, for the example depicted in \ref{['Fi:mesh']}, specifically \ref{['Fi:mesh_loaded_num']}. Parts owned by processes 0, 1, and 2 are shown in white, blue, and orange, respectively. Only maps from ${I}_T^{(0)}$ and ${I}_T^{(2)}$ to ${L}_P$ are depicted for simplicity. This map is to be composed with the inverse of $\chi_{{I}_P}^{{L}_P}$ depicted in \ref{['Fi:function_map']}. \newlabelFi:mesh_map0
Figure 3: Example illustrating the cone ordering and the DoF ordering of a one-dimensional one-element mesh. (Left) A one-dimensional one-element mesh with the cone of the cell entity ordered so that the right vertex comes first and then the left vertex, which is indicated by the arrow. (Middle) A schematic of the DP2 function space with three DoFs, $\sigma_0$, $\sigma_1$, and $\sigma_2$, ordered in the cell in a deterministic way based on the cone ordering. (Right) A schematic of the DP2 function space with DoFs wrongly ordered according to our definition. The DoF values in the DoF vector will be associated with the wrong DoFs. \newlabelFi:orientation_1d0
...and 6 more figures

Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations

TL;DR

Abstract

Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations

Authors

TL;DR

Abstract

Table of Contents

Figures (11)