EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC
Siyuan Shen, Mikhail Khalilov, Lukas Gianinazzi, Timo Schneider, Marcin Chrapek, Jai Dayal, Manisha Gajbe, Robert Wisniewski, Torsten Hoefler
TL;DR
EDAN introduces an Execution DAG–based toolchain that converts runtime instruction traces into eDAGs to quantify memory latency sensitivity in HPC workloads. It develops a Brent-inspired memory cost model and derives two metrics, λ and Λ, to rank and compare memory-latency sensitivity across programs, while also estimating theoretical bandwidth via data movement on the eDAG. Validation against gem5 on PolyBench demonstrates reasonable alignment in latency-sensitivity rankings and a substantial productivity advantage, with HPCG and LULESH case studies illustrating cache and memory-depth interactions. The work enables architecture-aware programming and hardware design by providing a fast, scalable method to analyze memory-latency effects in HPC applications, guiding cache sizing, memory disaggregation, and parallelization strategies.
Abstract
Resource disaggregation is a promising technique for improving the efficiency of large-scale computing systems. However, this comes at the cost of increased memory access latency due to the need to rely on the network fabric to transfer data between remote nodes. As such, it is crucial to ascertain an application's memory latency sensitivity to minimize the overall performance impact. Existing tools for measuring memory latency sensitivity often rely on custom ad-hoc hardware or cycle-accurate simulators, which can be inflexible and time-consuming. To address this, we present EDAN (Execution DAG Analyzer), a novel performance analysis tool that leverages an application's runtime instruction trace to generate its corresponding execution DAG. This approach allows us to estimate the latency sensitivity of sequential programs and investigate the impact of different hardware configurations. EDAN not only provides us with the capability of calculating the theoretical bounds for performance metrics, but it also helps us gain insight into the memory-level parallelism inherent to HPC applications. We apply EDAN to applications and benchmarks such as PolyBench, HPCG, and LULESH to unveil the characteristics of their intrinsic memory-level parallelism and latency sensitivity.
