Table of Contents
Fetching ...

Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0

Kacper Derlatka, Maciej Manna, Oleksii Bulenok, David Zwicker, Sylwester Arabas

TL;DR

The numba-mpi package offers access to the Message Passing Interface (MPI) routines from Python code that uses the Numba just-in-time (JIT) compiler, allowing high-performance and multi-threaded Python code to utilize MPI communication facilities without leaving the JIT-compiled code blocks.

Abstract

The numba-mpi package offers access to the Message Passing Interface (MPI) routines from Python code that uses the Numba just-in-time (JIT) compiler. As a result, high-performance and multi-threaded Python code may utilize MPI communication facilities without leaving the JIT-compiled code blocks, which is not possible with the mpi4py package, a higher-level Python interface to MPI. For debugging purposes, numba-mpi retains full functionality of the code even if the JIT compilation is disabled. The numba-mpi API constitutes a thin wrapper around the C API of MPI and is built around Numpy arrays including handling of non-contiguous views over array slices. Project development is hosted at GitHub leveraging the mpi4py/setup-mpi workflow enabling continuous integration tests on Linux (MPICH, OpenMPI & Intel MPI), macOS (MPICH & OpenMPI) and Windows (MS MPI). The paper covers an overview of the package features, architecture and performance. As of v1.0, the following MPI routines are exposed and covered by unit tests: size/rank, [i]send/[i]recv, wait[all|any], test[all|any], allreduce, bcast, barrier, scatter/[all]gather & wtime. The package is implemented in pure Python and depends on numpy, numba and mpi4py (the latter used at initialization and as a source of utility routines only). The performance advantage of using numba-mpi compared to mpi4py is depicted with a simple example, with entirety of the code included in listings discussed in the text. Application of numba-mpi for handling domain decomposition in numerical solvers for partial differential equations is presented using two external packages that depend on numba-mpi: py-pde and PyMPDATA-MPI.

Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0

TL;DR

The numba-mpi package offers access to the Message Passing Interface (MPI) routines from Python code that uses the Numba just-in-time (JIT) compiler, allowing high-performance and multi-threaded Python code to utilize MPI communication facilities without leaving the JIT-compiled code blocks.

Abstract

The numba-mpi package offers access to the Message Passing Interface (MPI) routines from Python code that uses the Numba just-in-time (JIT) compiler. As a result, high-performance and multi-threaded Python code may utilize MPI communication facilities without leaving the JIT-compiled code blocks, which is not possible with the mpi4py package, a higher-level Python interface to MPI. For debugging purposes, numba-mpi retains full functionality of the code even if the JIT compilation is disabled. The numba-mpi API constitutes a thin wrapper around the C API of MPI and is built around Numpy arrays including handling of non-contiguous views over array slices. Project development is hosted at GitHub leveraging the mpi4py/setup-mpi workflow enabling continuous integration tests on Linux (MPICH, OpenMPI & Intel MPI), macOS (MPICH & OpenMPI) and Windows (MS MPI). The paper covers an overview of the package features, architecture and performance. As of v1.0, the following MPI routines are exposed and covered by unit tests: size/rank, [i]send/[i]recv, wait[all|any], test[all|any], allreduce, bcast, barrier, scatter/[all]gather & wtime. The package is implemented in pure Python and depends on numpy, numba and mpi4py (the latter used at initialization and as a source of utility routines only). The performance advantage of using numba-mpi compared to mpi4py is depicted with a simple example, with entirety of the code included in listings discussed in the text. Application of numba-mpi for handling domain decomposition in numerical solvers for partial differential equations is presented using two external packages that depend on numba-mpi: py-pde and PyMPDATA-MPI.
Paper Structure (13 sections, 2 equations, 3 figures, 2 tables)

This paper contains 13 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Depiction of up to three-fold speedup obtained by using numba-mpi instead of mpi4py to avoid leaving the JIT-compiled code blocks. Figure created using the script in Listing \ref{['lst:timing']} (and using code Listings \ref{['lst:hello_world']}-\ref{['lst:numba_mpi']}).
  • Figure 2: Runtime $t$ of core calculation in Listing \ref{['lst:py-pde']} as a function of the number $N$ of MPI cores. The predicted scaling $t \propto N^{-1}$ is indicated by the dotted line. Standard deviations determined from three repeated runs of the runtimes are smaller than the symbol size.
  • Figure 3: Different domain decomposition layouts tested in PyMPDATA-MPI with multi-threading (3 threads in all cases, dotted lines) and multi-processing (2 processes) carried out either along the same or distinct dimensions. The simulation setup involves a "hello-world" homogeneous advection problem with periodic boundary conditions.