TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks

J. Gregory Pauloski; Valerie Hayot-Sasson; Maxime Gonthier; Nathaniel Hudson; Haochen Pan; Sicheng Zhou; Ian Foster; Kyle Chard

TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks

J. Gregory Pauloski, Valerie Hayot-Sasson, Maxime Gonthier, Nathaniel Hudson, Haochen Pan, Sicheng Zhou, Ian Foster, Kyle Chard

TL;DR

TaPS addresses the absence of standardized benchmarks for task-based execution frameworks by delivering a framework-agnostic Application Model, an extensible Executor plugin system, and a suite of real and synthetic reference applications. It demonstrates how applications can be written once and benchmarked with multiple executors and data-management plugins, enabling reproducible comparisons of overheads, data transfer, and scalability across Dask, Parsl, Ray, Globus Compute, and TaskVine. The paper provides detailed design, implementation, and an evaluation across diverse workloads (e.g., Cholesky, Protein Docking, Federated Learning, MapReduce, Molecular Design, Montage, Failure Injection, Synthetic Workflow) to illustrate TaPS’s capability to quantify execution performance and guide future optimizations. Overall, TaPS offers a practical, long-term standard to accelerate innovation in parallel task execution and heterogeneous computing by leveling the benchmarking playing field and promoting cross-framework comparability.

Abstract

Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a computational goal. Task-based execution frameworks abstract the parallel execution of an application's tasks on arbitrary hardware. Research into these task executors has accelerated as computational sciences increasingly need to take advantage of parallel compute and/or heterogeneous hardware. However, the lack of evaluation standards makes it challenging to compare and contrast novel systems against existing implementations. Here, we introduce TaPS, the Task Performance Suite, to support continued research in parallel task executor frameworks. TaPS provides (1) a unified, modular interface for writing and evaluating applications using arbitrary execution frameworks and data management systems and (2) an initial set of reference synthetic and real-world science applications. We discuss how the design of TaPS supports the reliable evaluation of frameworks and demonstrate TaPS through a survey of benchmarks using the provided reference applications.

TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks

TL;DR

Abstract

Paper Structure (25 sections, 5 figures, 3 tables)

This paper contains 25 sections, 5 figures, 3 tables.

Introduction
Background and Related Work
Design and Implementation
Application Model
Writing Applications
Application Execution
Task Executor Model
Supported Task Executors
Task Data Model
Logging and Metrics
Task Life-cycle
Applications
Cholesky Factorization
Protein Docking
Federated Learning
...and 10 more sections

Figures (5)

Figure 1: Overview of the TaPS stack.
Figure 2: Example task dependency diagrams for each application. In most applications, the exact structure depends on the application configuration.
Figure 3: Average application makespan over three runs. Error bars denote standard deviation.
Figure 4: Executor scaling performance with no-op tasks. Each configuration is repeated three times and shaded regions represent the standard deviation.
Figure 5: Average round-trip time for no-op tasks as a function of input/output data size. Error bars denote standard deviation from three runs of 320 tasks ($10\times 32$ workers). The Globus Compute baseline is not evaluated at 10 MB due to task payload limits of the Globus Compute service.

TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks

TL;DR

Abstract

TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)