Table of Contents
Fetching ...

Extrae.jl: Julia bindings for the Extrae HPC Profiler

Sergio Sanchez-Ramirez, Mosè Giordano

TL;DR

Extrae.jl provides Julia bindings to the Extrae HPC profiler, enabling seamless generation of Paraver traces by mapping Julia’s MPI, OpenMP-like, and threading models to Paraver’s process and resource models. The design supports instrumentation via library interception or binary rewriting, annotation types for states, events, and communications, and optional hardware counter sampling through PAPI, with practical guidance for distributed mappings. The evaluation on a Taylor-Green vortex MPI application demonstrates how Paraver can reveal parallelism issues, bottlenecks like MPI_Waitany and MPI_Allreduce, and bandwidth dynamics, validating the approach for Julia HPC workloads. Overall, the work lowers profiling barriers for Julia users, enables cross-ecosystem analysis, and points to future integrations with other HPC tools and trace formats.

Abstract

The Julia programming language has gained acceptance within the High-Performance Computing (HPC) community due to its ability to tackle two-language problem: Julia code feels as high-level as Python but allows developers to tune it to C-level performance. But to squeeze every drop of performance, Julia needs to integrate with advanced performance analysis tools, also known as profilers. In this work, we present Extrae.jl, a Julia package to interface with the Extrae profiler.

Extrae.jl: Julia bindings for the Extrae HPC Profiler

TL;DR

Extrae.jl provides Julia bindings to the Extrae HPC profiler, enabling seamless generation of Paraver traces by mapping Julia’s MPI, OpenMP-like, and threading models to Paraver’s process and resource models. The design supports instrumentation via library interception or binary rewriting, annotation types for states, events, and communications, and optional hardware counter sampling through PAPI, with practical guidance for distributed mappings. The evaluation on a Taylor-Green vortex MPI application demonstrates how Paraver can reveal parallelism issues, bottlenecks like MPI_Waitany and MPI_Allreduce, and bandwidth dynamics, validating the approach for Julia HPC workloads. Overall, the work lowers profiling barriers for Julia users, enables cross-ecosystem analysis, and points to future integrations with other HPC tools and trace formats.

Abstract

The Julia programming language has gained acceptance within the High-Performance Computing (HPC) community due to its ability to tackle two-language problem: Julia code feels as high-level as Python but allows developers to tune it to C-level performance. But to squeeze every drop of performance, Julia needs to integrate with advanced performance analysis tools, also known as profilers. In this work, we present Extrae.jl, a Julia package to interface with the Extrae profiler.

Paper Structure

This paper contains 7 sections, 5 figures.

Figures (5)

  • Figure 1: Instantaneous parallelism during the workload of the application measured as the number of MPI ranks not being idle at the given moment. Maximum value is 16 and minimum is 1.
  • Figure 2: Timeline of MPI calls for the (a) runtime of the application and (b) zoom on a couple of iterations of the main workload. Horizontal axis is time and each row shows the events that took place on each MPI rank. Blocks represent MPI calls where color maps to MPI routines and the start and end positions fit the duration of the call. Yellow lines represent messages sent betweens ranks.
  • Figure 3: Connectivity pattern between MPI ranks measured as the number of messages sent from MPI rank x to rank y. Grey cells mean no communication while green cells mean represent a value of 2,016 messages sent.
  • Figure 4: Fraction of time spent on each MPI routine. Dispersion comes from the time spent by different MPI ranks.
  • Figure 5: Node network bandwidth in the time region of Figure \ref{['fig:timeline:zoom']} measured in MB/s. Only one node is shown because all MPI ranks run on the same node.