PASTA: A Modular Program Analysis Tool Framework for Accelerators

Mao Lin; Hyeran Jeon; Keren Zhou

PASTA: A Modular Program Analysis Tool Framework for Accelerators

Mao Lin, Hyeran Jeon, Keren Zhou

TL;DR

It is shown that PASTA provides detailed performance insights with significantly lower overhead than conventional analysis tools, thanks to its GPU-accelerated backend, making it well-suited for modern accelerator-based computing environments.

Abstract

The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools. We present PASTA, a low-overhead and modular Program AnalysiS Tool Framework for Accelerators. PASTA abstracts over low-level profiling APIs and diverse deep learning frameworks, offering users a unified interface to capture and analyze runtime events at multiple levels. Its extensible design enables researchers and practitioners to rapidly prototype custom tools with minimal overhead. We demonstrate the utility of PASTA by developing several analysis tools, including a deep learning workload characterization tool and a UVM optimization tool. Through extensive evaluation on mainstream deep learning workloads tested on NVIDIA and AMD GPUs under both single- and multi-GPU scenarios, we demonstrate PASTA's broad applicability. On NVIDIA GPUs, we further show that PASTA provides detailed performance insights with significantly lower overhead, up to 1.3*10^4 faster than conventional analysis tools, thanks to its GPU-accelerated backend. PASTA strikes a practical balance between usability, extensibility, and efficiency, making it well-suited for modern accelerator-based computing environments.

PASTA: A Modular Program Analysis Tool Framework for Accelerators

TL;DR

Abstract

Paper Structure (46 sections, 15 figures, 5 tables)

This paper contains 46 sections, 15 figures, 5 tables.

Introduction
Background and Related Work
GPU Performance Analysis
DL Workload Performance Analysis
Design and Methodology
Overall Design
Pasta Modules
Workflow
Support for Diverse GPU Platforms
Support for Diverse DL Frameworks
Advanced Features
Range-Specific Analysis
Inefficiency Location Utilities
Generalization to Emerging Accelerators and Workloads
Extensibility for Diverse Analyses
...and 31 more sections

Figures (15)

Figure 1: Design of Pasta.
Figure 2: Comparison of CPU- and GPU-based analysis models.
Figure 3: Workflow of Pasta.
Figure 4: Cross-layer call stack of the kernel with highest memory reference count during BERT inference. The trace spans Python-level code, PyTorch modules, and low-level C++/CUDA operations.
Figure 5: Codebase structure of Pasta.
...and 10 more figures

PASTA: A Modular Program Analysis Tool Framework for Accelerators

TL;DR

Abstract

PASTA: A Modular Program Analysis Tool Framework for Accelerators

Authors

TL;DR

Abstract

Table of Contents

Figures (15)