Automated Programmatic Performance Analysis of Parallel Programs
Onur Cankur, Aditya Tomar, Daniel Nichols, Connor Scully-Allison, Katherine E. Isaacs, Abhinav Bhatele
TL;DR
The paper tackles the challenge of analyzing large-scale parallel performance data by introducing Chopper, a Python API built on Hatchet that enables high-level, configurable analyses for both single-run and multi-run executions. It provides programmatic tools to detect load imbalance, hot paths, scalability bottlenecks, and metric–CCT correlations, along with variability analysis, using a unified interface that easily integrates with existing Python visualization stacks. The authors implement a range of single-run and multi-run capabilities, demonstrate reading multiple profiles, unifying GraphFrames, and computing per-CCT node efficiency and speedup across diverse HPC apps (AMG, Laghos, LULESH, Quicksilver, Tortuga). Through case studies and API performance tests, the work shows that Chopper significantly reduces developer effort and enables scalable, reproducible performance analysis workflows. The work suggests future enhancements in predictive modeling for correlation, customizable plotting, and GPU performance analysis to broaden applicability.
Abstract
Developing efficient parallel applications is critical to advancing scientific development but requires significant performance analysis and optimization. Performance analysis tools help developers manage the increasing complexity and scale of performance data, but often rely on the user to manually explore low-level data and are rigid in how the data can be manipulated. We propose a Python-based API, Chopper, which provides high-level and flexible performance analysis for both single and multiple executions of parallel applications. Chopper facilitates performance analysis and reduces developer effort by providing configurable high-level methods for common performance analysis tasks such as calculating load imbalance, hot paths, scalability bottlenecks, correlation between metrics and CCT nodes, and causes of performance variability within a robust and mature Python environment that provides fluid access to lower-level data manipulations. We demonstrate how Chopper allows developers to quickly and succinctly explore performance and identify issues across applications such as AMG, Laghos, LULESH, Quicksilver and Tortuga.
