Detrimental task execution patterns in mainstream OpenMP runtimes
Adam S. Tuft, Tobias Weinzierl, Michael Klemm
TL;DR
The paper tackles detrimental task execution patterns in mainstream OpenMP runtimes by analyzing a stationary black-hole simulation with Otter-based tracing. It identifies four problematic patterns—premature task activation, lack of embedded parallelism, unfair yields, and throughput-biased waiting—that harm the critical path. To address them, the authors propose prescriptive API extensions (e.g., deferrable tasks, priority-enabled taskloop, and latency/throughput taskyield qualifiers) and outline practical realizations that rely on modest runtime changes and task-priority management. The work aims to empower developers to control task scheduling more explicitly, enabling incremental performance improvements for task-heavy HPC codes while inviting broader evaluation across runtimes and domains.
Abstract
The OpenMP API offers both task-based and data-parallel concepts to scientific computing. While it provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how to implement its annotations. As the predominant OpenMP implementations share design rationales, they introduce "quasi-standards how certain annotations behave. By means of a task-based astrophysical simulation code, we highlight situations where this "quasi-standard" reference behaviour introduces performance flaws. Therefore, we propose prescriptive clauses to constrain the OpenMP implementations. Simulated task traces uncover the clauses' potential, while a discussion of their realization highlights that they would manifest in rather incremental changes to any OpenMP runtime supporting task priorities.
