Table of Contents
Fetching ...

Optimizations on Graph-Level for Domain Specific Computations in Julia and Application to QED

Anton Reinhard, Simeon Ehrig, René Widera, Michael Bussmann, Uwe Hernandez Acosta

TL;DR

The paper addresses the challenge of performance-portable execution of domain-specific HPC workloads by representing computations as Computable DAGs (CDAGs) and generating statically scheduled, compiled Julia code. It introduces the Julia package ComputableDAGs.jl and a workflow that uses domain models and generators to emit a CDAG, estimates its cost with lightweight or detailed estimators, applies graph-level optimizations via node operations, and then statically schedules the computation to available devices before code generation and execution. The framework is demonstrated on perturbative quantum electrodynamics (QED) calculations of matrix elements for Compton scattering, as well as on the ABC-model and Strassen, illustrating domain-agnostic applicability and substantial GPU acceleration while highlighting compile-time bottlenecks and optimization benefits. The authors discuss future work including richer estimators, heterogeneous scheduling, vectorization, term rewriting, memory and energy considerations, and scaling to large HPC systems.

Abstract

Complex computational problems in science often consist of smaller parts that can have largely distinct compute requirements from one another. For optimal efficiency, analyzing each subtask and scheduling it on the best-suited hardware would be necessary. Other considerations must be taken into account, too, such as parallelism, dependencies between different subtasks, and data transfer speeds between devices. To achieve this, directed acyclic graphs are often employed to represent these problems and enable utilizing as much hardware as possible on a given machine. In this paper, we present a software framework written in Julia capable of automatically and dynamically producing statically scheduled and compiled code. We lay theoretical foundations and add domain-specific information about the computation to the existing concepts of DAG scheduling, enabling optimizations that would otherwise be impossible. To illustrate the theory we implement an example application: the computation of matrix elements for scattering processes with many external particles in quantum electrodynamics.

Optimizations on Graph-Level for Domain Specific Computations in Julia and Application to QED

TL;DR

The paper addresses the challenge of performance-portable execution of domain-specific HPC workloads by representing computations as Computable DAGs (CDAGs) and generating statically scheduled, compiled Julia code. It introduces the Julia package ComputableDAGs.jl and a workflow that uses domain models and generators to emit a CDAG, estimates its cost with lightweight or detailed estimators, applies graph-level optimizations via node operations, and then statically schedules the computation to available devices before code generation and execution. The framework is demonstrated on perturbative quantum electrodynamics (QED) calculations of matrix elements for Compton scattering, as well as on the ABC-model and Strassen, illustrating domain-agnostic applicability and substantial GPU acceleration while highlighting compile-time bottlenecks and optimization benefits. The authors discuss future work including richer estimators, heterogeneous scheduling, vectorization, term rewriting, memory and energy considerations, and scaling to large HPC systems.

Abstract

Complex computational problems in science often consist of smaller parts that can have largely distinct compute requirements from one another. For optimal efficiency, analyzing each subtask and scheduling it on the best-suited hardware would be necessary. Other considerations must be taken into account, too, such as parallelism, dependencies between different subtasks, and data transfer speeds between devices. To achieve this, directed acyclic graphs are often employed to represent these problems and enable utilizing as much hardware as possible on a given machine. In this paper, we present a software framework written in Julia capable of automatically and dynamically producing statically scheduled and compiled code. We lay theoretical foundations and add domain-specific information about the computation to the existing concepts of DAG scheduling, enabling optimizations that would otherwise be impossible. To illustrate the theory we implement an example application: the computation of matrix elements for scattering processes with many external particles in quantum electrodynamics.

Paper Structure

This paper contains 27 sections, 1 theorem, 8 equations, 13 figures, 1 table.

Key Result

Proposition 1

$\{\mathbb{D}_{T_D}, \mathbb{D}_{T_C}\}$ is a partition of $\mathbb{D}_T$.

Figures (13)

  • Figure 1: The two Feynman diagrams contributing to the Compton scattering process $e^- \gamma \to e^- \gamma$ at tree-level. In the left diagram, the incoming electron first interacts with the incoming photon. The resulting inner line (also called a virtual particle) then interacts with the outgoing photon and outgoing electron. In the right diagram, the order of the photon interactions is reversed.
  • Figure 2: Example of a Computable DAG with functions assigned to every compute node. Data nodes are shown in blue, compute nodes in red. The CDAG has two real numbers, $x_1$ and $x_2$, as inputs and another real number, $x_5$, as output. By following along the edges, we can find that the intermediate value $x_3 := e^{x_1}$ and $x_4 := 5x_2 - 2$. Putting these results into the third data node, we find that the result $x_5 := x_4 * \sin(x_3) = (5x_2 - 2) * \sin(e^{x_1})$. This is the function the CDAG computes.
  • Figure 3: Modules of the software and their interactions
  • Figure 4: A flowchart showing how a DAG is compiled into a callable function using ComputableDAGs.jl.
  • Figure 5: The CDAG to compute the matrix element of the scattering process $e^- \gamma \to {e^-}' \gamma'$ at tree-level for a single spin and polarization combination. Nodes in blue are data nodes and nodes in red are compute nodes. The entry nodes (bottom row) contain the momenta for each of the particles participating in the process. Next, each base_state is calculated. Then, the four possible combinations of electron-photon combinations are made. From these, two full diagram values are calculated and finally added to yield the matrix element.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Example B.1
  • Proposition 1
  • proof