Optimizations on Graph-Level for Domain Specific Computations in Julia and Application to QED
Anton Reinhard, Simeon Ehrig, René Widera, Michael Bussmann, Uwe Hernandez Acosta
TL;DR
The paper addresses the challenge of performance-portable execution of domain-specific HPC workloads by representing computations as Computable DAGs (CDAGs) and generating statically scheduled, compiled Julia code. It introduces the Julia package ComputableDAGs.jl and a workflow that uses domain models and generators to emit a CDAG, estimates its cost with lightweight or detailed estimators, applies graph-level optimizations via node operations, and then statically schedules the computation to available devices before code generation and execution. The framework is demonstrated on perturbative quantum electrodynamics (QED) calculations of matrix elements for Compton scattering, as well as on the ABC-model and Strassen, illustrating domain-agnostic applicability and substantial GPU acceleration while highlighting compile-time bottlenecks and optimization benefits. The authors discuss future work including richer estimators, heterogeneous scheduling, vectorization, term rewriting, memory and energy considerations, and scaling to large HPC systems.
Abstract
Complex computational problems in science often consist of smaller parts that can have largely distinct compute requirements from one another. For optimal efficiency, analyzing each subtask and scheduling it on the best-suited hardware would be necessary. Other considerations must be taken into account, too, such as parallelism, dependencies between different subtasks, and data transfer speeds between devices. To achieve this, directed acyclic graphs are often employed to represent these problems and enable utilizing as much hardware as possible on a given machine. In this paper, we present a software framework written in Julia capable of automatically and dynamically producing statically scheduled and compiled code. We lay theoretical foundations and add domain-specific information about the computation to the existing concepts of DAG scheduling, enabling optimizations that would otherwise be impossible. To illustrate the theory we implement an example application: the computation of matrix elements for scattering processes with many external particles in quantum electrodynamics.
