Table of Contents
Fetching ...

Specx: a C++ task-based runtime system for heterogeneous distributed architectures

Paul Cardosi, Bérenger Bramas

TL;DR

This article presents Specx, a task-based runtime system written in modern C++ that supports distributed heterogeneous computing by simultaneously exploiting central processing units (CPUs) and graphics processing units (GPUs) and incorporating communication into the task graph and describing hardware-independent algorithms.

Abstract

Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific computing. Not only does it allow for efficient parallelization across distributed heterogeneous computing nodes, but it also allows for elegant source code structuring by describing hardware-independent algorithms. In this paper, we present Specx, a task-based runtime system written in modern C++. Specx supports distributed heterogeneous computing by simultaneously exploiting CPUs and GPUs (CUDA/HIP) and incorporating communication into the task graph. We describe the specificities of Specx and demonstrate its potential by running parallel applications.

Specx: a C++ task-based runtime system for heterogeneous distributed architectures

TL;DR

This article presents Specx, a task-based runtime system written in modern C++ that supports distributed heterogeneous computing by simultaneously exploiting central processing units (CPUs) and graphics processing units (GPUs) and incorporating communication into the task graph and describing hardware-independent algorithms.

Abstract

Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific computing. Not only does it allow for efficient parallelization across distributed heterogeneous computing nodes, but it also allows for elegant source code structuring by describing hardware-independent algorithms. In this paper, we present Specx, a task-based runtime system written in modern C++. Specx supports distributed heterogeneous computing by simultaneously exploiting CPUs and GPUs (CUDA/HIP) and incorporating communication into the task graph. We describe the specificities of Specx and demonstrate its potential by running parallel applications.
Paper Structure (29 sections, 6 figures)

This paper contains 29 sections, 6 figures.

Figures (6)

  • Figure 1: Simplified view of a heterogeneous computing node with 2 CPUs and 4 GPUs. Multiple nodes can be interconnected via network.
  • Figure 2: Example of graphs and execution trace exported after a run.
  • Figure 3: Estimation of the overheads for the write ($\bullet$) and commutative-write ($+$) data accesses for different number of dependencies. We provide the maximum overhead reached ($- -$) and the average one ($-$). The overhead is given for picking a task $O$ (right column) and the insertion $I$ (left column).
  • Figure 4: Performance results for the GEMM and Cholesky test cases. The $x$-axis represents the test case's size, and the $y$-axis represents the speedup over a sequential execution.
  • Figure 5: Performance results for the axpy and particle test cases. The $x$-axis represents the test case's size, and the $y$-axis represents the speedup over a sequential execution.
  • ...and 1 more figures