Pipit: Scripting the analysis of parallel execution traces

Abhinav Bhatele; Rakrish Dhakal; Alexander Movsesyan; Aditya K. Ranjan; Onur Cankur

Pipit: Scripting the analysis of parallel execution traces

Abhinav Bhatele, Rakrish Dhakal, Alexander Movsesyan, Aditya K. Ranjan, Onur Cankur

TL;DR

Pipit addresses the challenge of scalable, automated analysis of parallel execution traces by providing a Python-based API built on top of pandas that reads multiple trace formats into a uniform DataFrame model. It unifies per-process/per-thread trace data, offers high-level and low-level operations for computing inclusive/exclusive times, call graphs, and communication analyses, and includes data reduction and visualization capabilities. The library supports cross-run comparisons and pattern detection, enabling scriptable performance tuning workflows beyond what GUI tools offer. By delivering a modular, extensible, and open-source solution, Pipit aims to streamline reproducible performance analysis and accelerate HPC optimization across diverse trace formats and applications.

Abstract

Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program execution to identify different kinds of performance issues. Often times, trace collection tools provide a graphical tool to analyze the trace output. However, these GUI-based tools only support specific file formats, are challenging to scale to large trace sizes, limit data exploration to the implemented graphical views, and do not support automated comparisons of two or more datasets. In this paper, we present a programmatic approach to analyzing parallel execution traces by leveraging pandas, a powerful Python-based data analysis library. We have developed a Python library, Pipit, on top of pandas that can read traces in different file formats (OTF2, HPCToolkit, Projections, Nsight Systems, etc.) and provides a uniform data structure in the form of a pandas DataFrame. Pipit provides operations to aggregate, filter, and transform the events in a trace to present the data in different ways. We also provide several functions to quickly and easily identify performance issues in parallel executions. More importantly, the API is easily extensible to support custom analyses by different end users.

Pipit: Scripting the analysis of parallel execution traces

TL;DR

Abstract

Paper Structure (23 sections, 13 figures, 1 table)

This paper contains 23 sections, 13 figures, 1 table.

Motivation
Background and Related Work
Execution Traces and Trace Collection Tools
Trace Visualization Tools
Other Related Analysis Tools
The Pipit Library
The trace as a pandas DataFrame
Reading a dataset
Generating a call graph
The Pipit API
Extracting caller-callee relationships
Analyzing summary performance
Analyzing communication performance
Identifying performance issues
Data Reduction
...and 8 more sections

Figures (13)

Figure 1: A sample trace file in CSV format (left), and the corresponding events DataFrame generated by Pipit after reading it (right) using the code snippet at the bottom.
Figure 2: Time profile of a Tortuga trace with 64 processes.
Figure 3: Communication matrix of a Laghos execution on 32 processes, with a linear colormap (left) and logarithmic colormap (right).
Figure 4: Message size histogram of a Laghos execution on 32 processes. We see that messages are not distributed uniformly.
Figure 5: Performance of the OTF2 Reader and comm_matrix for various traces of AMG and Laghos (left). Strong scaling performance of the OTF2 Reader for AMG 128-process and Laghos 256-process traces (center). Memory consumption of the OTF2 Reader for various traces of AMG and Laghos (right).
...and 8 more figures

Pipit: Scripting the analysis of parallel execution traces

TL;DR

Abstract

Pipit: Scripting the analysis of parallel execution traces

Authors

TL;DR

Abstract

Table of Contents

Figures (13)