Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0

Kacper Derlatka; Maciej Manna; Oleksii Bulenok; David Zwicker; Sylwester Arabas

Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0

Kacper Derlatka, Maciej Manna, Oleksii Bulenok, David Zwicker, Sylwester Arabas

TL;DR

The numba-mpi package offers access to the Message Passing Interface (MPI) routines from Python code that uses the Numba just-in-time (JIT) compiler, allowing high-performance and multi-threaded Python code to utilize MPI communication facilities without leaving the JIT-compiled code blocks.

Abstract

The numba-mpi package offers access to the Message Passing Interface (MPI) routines from Python code that uses the Numba just-in-time (JIT) compiler. As a result, high-performance and multi-threaded Python code may utilize MPI communication facilities without leaving the JIT-compiled code blocks, which is not possible with the mpi4py package, a higher-level Python interface to MPI. For debugging purposes, numba-mpi retains full functionality of the code even if the JIT compilation is disabled. The numba-mpi API constitutes a thin wrapper around the C API of MPI and is built around Numpy arrays including handling of non-contiguous views over array slices. Project development is hosted at GitHub leveraging the mpi4py/setup-mpi workflow enabling continuous integration tests on Linux (MPICH, OpenMPI & Intel MPI), macOS (MPICH & OpenMPI) and Windows (MS MPI). The paper covers an overview of the package features, architecture and performance. As of v1.0, the following MPI routines are exposed and covered by unit tests: size/rank, [i]send/[i]recv, wait[all|any], test[all|any], allreduce, bcast, barrier, scatter/[all]gather & wtime. The package is implemented in pure Python and depends on numpy, numba and mpi4py (the latter used at initialization and as a source of utility routines only). The performance advantage of using numba-mpi compared to mpi4py is depicted with a simple example, with entirety of the code included in listings discussed in the text. Application of numba-mpi for handling domain decomposition in numerical solvers for partial differential equations is presented using two external packages that depend on numba-mpi: py-pde and PyMPDATA-MPI.

Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0

TL;DR

Abstract

Paper Structure (13 sections, 2 equations, 3 figures, 2 tables)

This paper contains 13 sections, 2 equations, 3 figures, 2 tables.

Background and motivation
Accelerating Python code with Numba
Using MPI from Python with mpi4py and its incompatibility with Numba
Software description, usage basics and performance
Using MPI from Python with numba-mpi
Performance gain from using numba-mpi as compared to mpi
Philosophy and current status of the API
numba-mpi package availability
Test suite and workflows
Illustrative examples of use in external software
Usage of numba-mpi in py-pde
Usage of numba-mpi in PyMPDATA-MPI
Summary and outlook

Figures (3)

Figure 1: Depiction of up to three-fold speedup obtained by using numba-mpi instead of mpi4py to avoid leaving the JIT-compiled code blocks. Figure created using the script in Listing \ref{['lst:timing']} (and using code Listings \ref{['lst:hello_world']}-\ref{['lst:numba_mpi']}).
Figure 2: Runtime $t$ of core calculation in Listing \ref{['lst:py-pde']} as a function of the number $N$ of MPI cores. The predicted scaling $t \propto N^{-1}$ is indicated by the dotted line. Standard deviations determined from three repeated runs of the runtimes are smaller than the symbol size.
Figure 3: Different domain decomposition layouts tested in PyMPDATA-MPI with multi-threading (3 threads in all cases, dotted lines) and multi-processing (2 processes) carried out either along the same or distinct dimensions. The simulation setup involves a "hello-world" homogeneous advection problem with periodic boundary conditions.

Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0

TL;DR

Abstract

Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0

Authors

TL;DR

Abstract

Table of Contents

Figures (3)