Enabling Heterogeneous Performance Analysis for Scientific Workloads
Maksymilian Graczyk, Vincent Desbiolles, Stefan Roiser, Andrea Guerrieri
TL;DR
This paper tackles profiling challenges in heterogeneous scientific workloads by evaluating architecture-agnostic performance analysis with Adaptyst. It focuses on two eBPF-based profiling options, Uprobes and USDT, assessing their runtime overhead and deployment complexity. Using a small C benchmark on a dedicated workstation, the study reports overheads around 4.8–5.1% and notes that Uprobes imposes more system-time while USDT shows slightly higher variability. The results inform a roadmap for integrating eBPF-based profiling into Adaptyst and extending capabilities to non-CPU devices, advancing heterogeneous performance analysis for scientific workloads.
Abstract
Heterogeneous computing integrates diverse processing elements, such as CPUs, GPUs, and FPGAs, within a single system, aiming to leverage the strengths of each architecture to optimize performance and energy consumption. In this context, efficient performance analysis plays a critical role in determining the most suitable platform for dispatching tasks, ensuring that workloads are allocated to the processing units where they can execute most effectively. Adaptyst is a novel ongoing effort at CERN, with the aim to develop an open-source, architecture-agnostic performance analysis for scientific workloads. This study explores the performance and implementation complexity of two built-in eBPF-based methods such as Uprobes and USDT, with the aim of outlining a roadmap for future integration into Adaptyst and advancing toward heterogeneous performance analysis capabilities.
