Table of Contents
Fetching ...

Step-based checkpointing with high-level algorithmic differentiation

James R. Maddison

TL;DR

This article considers the combination of high-level algorithmic differentiation with step-based checkpointing schedules, with the primary application being for solvers of time-dependent partial differential equations.

Abstract

Automated code generation allows for a separation between the development of a model, expressed via a domain specific language, and lower level implementation details. Algorithmic differentiation can be applied symbolically at the level of the domain specific language, and the code generator reused to implement code required for an adjoint calculation. However the adjoint calculations are complicated by the well-known problem of storing or recomputing the forward data required by the adjoint, and different checkpointing strategies have been developed to tackle this problem. This article considers the combination of high-level algorithmic differentiation with step-based checkpointing schedules, with the primary application being for solvers of time-dependent partial differential equations. The focus is on algorithmic differentiation using a dynamically constructed record of forward operations, where the precise structure of the original forward calculation is unknown ahead-of-time. In addition, high-level approaches provide a simplified view of the model itself. This allows data required to restart and advance the forward, and data required to advance the adjoint, to be identified. The difference between the two types of data is here leveraged to implement checkpointing strategies with improved performance.

Step-based checkpointing with high-level algorithmic differentiation

TL;DR

This article considers the combination of high-level algorithmic differentiation with step-based checkpointing schedules, with the primary application being for solvers of time-dependent partial differential equations.

Abstract

Automated code generation allows for a separation between the development of a model, expressed via a domain specific language, and lower level implementation details. Algorithmic differentiation can be applied symbolically at the level of the domain specific language, and the code generator reused to implement code required for an adjoint calculation. However the adjoint calculations are complicated by the well-known problem of storing or recomputing the forward data required by the adjoint, and different checkpointing strategies have been developed to tackle this problem. This article considers the combination of high-level algorithmic differentiation with step-based checkpointing schedules, with the primary application being for solvers of time-dependent partial differential equations. The focus is on algorithmic differentiation using a dynamically constructed record of forward operations, where the precise structure of the original forward calculation is unknown ahead-of-time. In addition, high-level approaches provide a simplified view of the model itself. This allows data required to restart and advance the forward, and data required to advance the adjoint, to be identified. The difference between the two types of data is here leveraged to implement checkpointing strategies with improved performance.
Paper Structure (17 sections, 2 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 2 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Visualization of the computational graph for two timesteps in a solver for the barotropic vorticity equation. Step 0 corresponds to initialization and a forward Euler step, and step 1 to a second order Adams-Bashforth step and evaluation of a functional.
  • Figure 2: As in Figure \ref{['fig:stommel_2']}, but with the introduction of auxiliary steps to copy input parameters (the new step -1) and copy the output functional (the new step 2).
  • Figure 3: Visualization of the computational graph for three timesteps in a solver for the barotropic vorticity equation. Step 0 corresponds to initialization and a forward Euler step, step 1 to a second order Adams-Bashforth step, and step 2 to a third order Adams-Bashforth step and evaluation of a functional.
  • Figure 4: Two options considered when constructing the schedule, illustrated for $n = 5$ steps. Left: Case 1, storage of all non-linear dependency data in checkpoints and the intermediate storage. Right: Case 2, a single checkpointing unit. The schedules proceed from top to bottom. The numbered labels indicate the start of a given step, counting from zero. The black arrows pointing to the right, at the top, indicate forward advances. Below this a filled cross indicates a forward restart checkpoint, and a filled line with end bars a non-linear dependency data checkpoint, with checkpoints either stored as part of the indicated forward advance, or retained from previous forward advances. Dashed versions of these indicate a checkpoint which is loaded and then deleted. Deletes occur before any new checkpoints are stored. Red arrows pointing to the left indicate adjoint advances, occurring after loading of checkpoints and forward advances.
  • Figure 5: Options considered when constructing the schedule for case 3 (a), when $2 \le s \le n - 2$, illustrated for $n = 5$. For interpretation see Figure \ref{['fig:dynamic_programming_1_2']}. Only the initial forward advance is shown. Storage of a forward restart checkpoint together with a forward advance of $m$ steps with $m \in \left\{ 1, \ldots, n - 1 \right\}$. There are $s - 1$ checkpointing units remaining for use when advancing the adjoint over the final $n - m$ steps. All $s$ checkpointing units can be used when advancing the adjoint over the first $m$ steps -- with the indicated forward restart checkpoint deleted, after it is loaded, if needed.
  • ...and 4 more figures