Table of Contents
Fetching ...

SYCL compute kernels for ExaHyPE

Chung Ming Loi, Heinrich Bockhorst, Tobias Weinzierl

TL;DR

Three SYCL realizations are explored for a block-structured Finite Volume kernel in ExaHyPE, mapping compute graphs to for-loops, nested parallelism, and a DAG-based task graph. The study evaluates patch-wise, batched, and task-graph realizations on GPUs (A100) and Intel PVC, comparing data layouts and memory movement strategies while using the Rusanov flux for the Euler equations. The results show that, when mapped to a purely data-parallel SYCL implementation, a hybrid of task and data parallelism delivers best performance, while dynamic task graphs introduce substantial overhead. The work provides practical guidance on SYCL kernel orchestration for heterogeneous HPC codes and highlights ongoing challenges in nested parallelism and data management.

Abstract

We discuss three SYCL realisations of a simple Finite Volume scheme over multiple Cartesian patches. The realisation flavours differ in the way how they map the compute steps onto loops and tasks: We compare an implementation that is exclusively using a sequence of for-loops to a version that uses nested parallelism, and finally benchmark these against a version modelling the calculations as task graph. Our work proposes realisation idioms to realise these flavours within SYCL. The results suggest that a mixture of classic task and data parallelism performs if we map this hybrid onto a solely data-parallel SYCL implementation, taking into account SYCL specifics and the problem size.

SYCL compute kernels for ExaHyPE

TL;DR

Three SYCL realizations are explored for a block-structured Finite Volume kernel in ExaHyPE, mapping compute graphs to for-loops, nested parallelism, and a DAG-based task graph. The study evaluates patch-wise, batched, and task-graph realizations on GPUs (A100) and Intel PVC, comparing data layouts and memory movement strategies while using the Rusanov flux for the Euler equations. The results show that, when mapped to a purely data-parallel SYCL implementation, a hybrid of task and data parallelism delivers best performance, while dynamic task graphs introduce substantial overhead. The work provides practical guidance on SYCL kernel orchestration for heterogeneous HPC codes and highlights ongoing challenges in nested parallelism and data management.

Abstract

We discuss three SYCL realisations of a simple Finite Volume scheme over multiple Cartesian patches. The realisation flavours differ in the way how they map the compute steps onto loops and tasks: We compare an implementation that is exclusively using a sequence of for-loops to a version that uses nested parallelism, and finally benchmark these against a version modelling the calculations as task graph. Our work proposes realisation idioms to realise these flavours within SYCL. The results suggest that a mixture of classic task and data parallelism performs if we map this hybrid onto a solely data-parallel SYCL implementation, taking into account SYCL specifics and the problem size.
Paper Structure (22 sections, 3 equations, 11 figures, 7 algorithms)

This paper contains 22 sections, 3 equations, 11 figures, 7 algorithms.

Figures (11)

  • Figure 1: Sketch of the compute graph sketch for a kernel over $T$ patches. Each node in the graph represents a $d$-dimensional loop over all volumes of the patch subject to halo volumes where appropriate.
  • Figure 2: Partial sketch of the task graph fed into the task-graph realisation.
  • Figure 3: Cost per degree of freedom update for various $p$ and $T$ choices for $d=2$ on an NVIDIA A100 (top) or Intel PVC (bottom). Patch-wise realisation.
  • Figure 4: [id=R03] Breakdown of the total runtime for all patches from Fig. \ref{['figure:results:2d:AoS:patch-wise']} into total kernel compute time and total kernel runtime including data transfer cost.
  • Figure 5: Normalised runtime for batched kernels on the A100 for [id=us]$d=2$ (top) and $d=3$ (bottom)$d=3$.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4