Table of Contents
Fetching ...

Automatic Tracing in Task-Based Runtime Systems

Rohan Yadav, Michael Bauer, David Broman, Michael Garland, Alex Aiken, Fredrik Kjolstad

TL;DR

Apophenia automatically identifies and traces repeated dependence analyses in task-based runtimes, acting as a JIT for dependence analysis to remove the need for manual trace annotations. It accomplishes this with dynamic string analyses on a stream of hashed task tokens, using a suffix-array-based algorithm to find long, non-overlapping repeats and a trace replayer that schedules memoized traces via a trie-based matching system. The approach, implemented atop the Legion runtime, achieves performance within 0.92×–1.03× of manually traced code and delivers end-to-end speedups up to 2.82× on large HPC applications across Perlmutter and Eos, including complex modular codes such as cuPyNumeric, CFD, TorchSWE, and FlexFlow. This work demonstrates that automatic trace identification can significantly reduce runtime overheads, enable tracing for composition-heavy software, and scale to real-world applications without programmer-annotated traces.

Abstract

Implicitly parallel task-based runtime systems often perform dynamic analysis to discover dependencies in and extract parallelism from sequential programs. Dependence analysis becomes expensive as task granularity drops below a threshold. Tracing techniques have been developed where programmers annotate repeated program fragments (traces) issued by the application, and the runtime system memoizes the dependence analysis for those fragments, greatly reducing overhead when the fragments are executed again. However, manual trace annotation can be brittle and not easily applicable to complex programs built through the composition of independent components. We introduce Apophenia, a system that automatically traces the dependence analysis of task-based runtime systems, removing the burden of manual annotations from programmers and enabling new and complex programs to be traced. Apophenia identifies traces dynamically through a series of dynamic string analyses, which find repeated program fragments in the stream of tasks issued to the runtime system. We show that Apophenia is able to come between 0.92x--1.03x the performance of manually traced programs, and is able to effectively trace previously untraced programs to yield speedups of between 0.91x--2.82x on the Perlmutter and Eos supercomputers.

Automatic Tracing in Task-Based Runtime Systems

TL;DR

Apophenia automatically identifies and traces repeated dependence analyses in task-based runtimes, acting as a JIT for dependence analysis to remove the need for manual trace annotations. It accomplishes this with dynamic string analyses on a stream of hashed task tokens, using a suffix-array-based algorithm to find long, non-overlapping repeats and a trace replayer that schedules memoized traces via a trie-based matching system. The approach, implemented atop the Legion runtime, achieves performance within 0.92×–1.03× of manually traced code and delivers end-to-end speedups up to 2.82× on large HPC applications across Perlmutter and Eos, including complex modular codes such as cuPyNumeric, CFD, TorchSWE, and FlexFlow. This work demonstrates that automatic trace identification can significantly reduce runtime overheads, enable tracing for composition-heavy software, and scale to real-world applications without programmer-annotated traces.

Abstract

Implicitly parallel task-based runtime systems often perform dynamic analysis to discover dependencies in and extract parallelism from sequential programs. Dependence analysis becomes expensive as task granularity drops below a threshold. Tracing techniques have been developed where programmers annotate repeated program fragments (traces) issued by the application, and the runtime system memoizes the dependence analysis for those fragments, greatly reducing overhead when the fragments are executed again. However, manual trace annotation can be brittle and not easily applicable to complex programs built through the composition of independent components. We introduce Apophenia, a system that automatically traces the dependence analysis of task-based runtime systems, removing the burden of manual annotations from programmers and enabling new and complex programs to be traced. Apophenia identifies traces dynamically through a series of dynamic string analyses, which find repeated program fragments in the stream of tasks issued to the runtime system. We show that Apophenia is able to come between 0.92x--1.03x the performance of manually traced programs, and is able to effectively trace previously untraced programs to yield speedups of between 0.91x--2.82x on the Perlmutter and Eos supercomputers.
Paper Structure (42 sections, 10 figures, 2 algorithms)

This paper contains 42 sections, 10 figures, 2 algorithms.

Figures (10)

  • Figure 1: A cuPyNumeric legate-numpy program and the stream of tasks it issues at runtime. An intuitive trace around the main loop does not correspond to a repeated program fragment.
  • Figure 2: Example of a task stream and fixed trace set $T$ with an invalid matching function $f$, and two matching functions with different $\textsf{coverage}(T, f)$.
  • Figure 3: Visualization of Apophenia's dynamic analysis.
  • Figure 4: Execution of \ref{['fig:repeats-algorithm']} on "aabcbcbaa". The candidates for each suffix pair is shown between the pair.
  • Figure 5: Visualization of Apophenia's buffer sampling strategy on a buffer of size 8. After processing the $i$'th task, Apophenia mines the buffer slice labeled $i$.
  • ...and 5 more figures