Table of Contents
Fetching ...

OMP4Py: a pure Python implementation of OpenMP

César Piñeiro, Juan C. Pichel

TL;DR

OMP4Py addresses bringing OpenMP-style threading to Python via a directive-based transformation and a Python-native runtime. It implements the OpenMP 3.0 API through transformer directives and runtime calls entirely in Python, enabling Python programs to express parallel regions, worksharing, and tasks with familiar C/C++/Fortran syntax. Experimental results show limited numerical scalability on Python 3.13 due to interpreter threading constraints, while non-numerical workloads and hybrid mpi4py configurations scale more effectively, illustrating practical HPC potential within Python. The work points to future extensions to OpenMP 4–6 and further optimizations to support accelerators and reduce synchronization overhead.

Abstract

Python demonstrates lower performance in comparison to traditional high performance computing (HPC) languages such as C, C++, and Fortran. This performance gap is largely due to Python's interpreted nature and the Global Interpreter Lock (GIL), which hampers multithreading efficiency. However, the latest version of Python includes the necessary changes to make the interpreter thread-safe, allowing Python code to run without the GIL. This important update will enable users to fully exploit multithreading parallelism in Python. In order to facilitate that task, this paper introduces OMP4Py, the first pure Python implementation of OpenMP. We demonstrate that it is possible to bring OpenMP's familiar directive-based parallelization paradigm to Python, allowing developers to write parallel code with the same level of control and flexibility as in C, C++, or Fortran. The experimental evaluation shows that OMP4Py significantly impacts the performance of various types of applications, although the current threading limitation of Python's interpreter (v3.13) reduce its effectiveness for numerical applications.

OMP4Py: a pure Python implementation of OpenMP

TL;DR

OMP4Py addresses bringing OpenMP-style threading to Python via a directive-based transformation and a Python-native runtime. It implements the OpenMP 3.0 API through transformer directives and runtime calls entirely in Python, enabling Python programs to express parallel regions, worksharing, and tasks with familiar C/C++/Fortran syntax. Experimental results show limited numerical scalability on Python 3.13 due to interpreter threading constraints, while non-numerical workloads and hybrid mpi4py configurations scale more effectively, illustrating practical HPC potential within Python. The work points to future extensions to OpenMP 4–6 and further optimizations to support accelerators and reduce synchronization overhead.

Abstract

Python demonstrates lower performance in comparison to traditional high performance computing (HPC) languages such as C, C++, and Fortran. This performance gap is largely due to Python's interpreted nature and the Global Interpreter Lock (GIL), which hampers multithreading efficiency. However, the latest version of Python includes the necessary changes to make the interpreter thread-safe, allowing Python code to run without the GIL. This important update will enable users to fully exploit multithreading parallelism in Python. In order to facilitate that task, this paper introduces OMP4Py, the first pure Python implementation of OpenMP. We demonstrate that it is possible to bring OpenMP's familiar directive-based parallelization paradigm to Python, allowing developers to write parallel code with the same level of control and flexibility as in C, C++, or Fortran. The experimental evaluation shows that OMP4Py significantly impacts the performance of various types of applications, although the current threading limitation of Python's interpreter (v3.13) reduce its effectiveness for numerical applications.

Paper Structure

This paper contains 18 sections, 14 figures.

Figures (14)

  • Figure 1: Example of a Monte Carlo method for $\pi$ calculation using OMP4Py.
  • Figure 2: Example of the parallel directive: user code (top) and its corresponding translation by OMP4Py (bottom).
  • Figure 3: Example of the for directive: user code (top) and its corresponding translation by OMP4Py (bottom).
  • Figure 4: Example of the for directive with collapse and lastprivate clauses: user code (top) and its corresponding translation by OMP4Py (bottom).
  • Figure 5: Example of the sections directive: user code (top) and its corresponding translation by OMP4Py (bottom).
  • ...and 9 more figures