Table of Contents
Fetching ...

The Awkward World of Python and C++

Manasvi Goyal, Ianna Osborne, Jim Pivarski

TL;DR

The paper tackles the challenge of bridging Python and C++ for large-scale scientific analyses by introducing header-only, ABI-free C++ libraries that construct Awkward Arrays from C++ data and expose them to Python via raw buffers and a JSON Form. Central components include the LayoutBuilder and GrowableBuffer, implemented as templated, header-only templates that operate without Python bindings and communicate through a pybind11 buffer protocol. This approach enables seamless Python interop, supports JIT workflows in ROOT, and allows the Awkward Array ecosystem to extend to domains beyond HEP while avoiding platform-specific linking. Overall, the method enhances portability, simplifies integration into diverse projects, and provides a standalone C++ package for advanced data analysis in Python.

Abstract

There are undeniable benefits of binding Python and C++ to take advantage of the best features of both languages. This is especially relevant to the HEP and other scientific communities that have invested heavily in the C++ frameworks and are rapidly moving their data analyses to Python. Version 2 of Awkward Array, a Scikit-HEP Python library, introduces a set of header-only C++ libraries that do not depend on any application binary interface. Users can directly include these libraries in their compilation instead of linking against platform-specific libraries. This new development makes the integration of Awkward Arrays into other projects easier and more portable, as the implementation is easily separable from the rest of the Awkward Array codebase. The code is minimal; it does not include all of the code needed to use Awkward Arrays in Python, nor does it include references to Python or pybind11. The C++ users can use it to make arrays and then copy them to Python without any specialized data types - only raw buffers, strings, and integers. This C++ code also simplifies the process of just-in-time (JIT) compilation in ROOT. This implementation approach solves some of the drawbacks, like packaging projects where native dependencies can be challenging. In this paper, we demonstrate the technique to integrate C++ and Python using a header-only approach. We also describe the implementation of a new LayoutBuilder and a GrowableBuffer. Furthermore, examples of wrapping the C++ data into Awkward Arrays and exposing Awkward Arrays to C++ without copying them are discussed.

The Awkward World of Python and C++

TL;DR

The paper tackles the challenge of bridging Python and C++ for large-scale scientific analyses by introducing header-only, ABI-free C++ libraries that construct Awkward Arrays from C++ data and expose them to Python via raw buffers and a JSON Form. Central components include the LayoutBuilder and GrowableBuffer, implemented as templated, header-only templates that operate without Python bindings and communicate through a pybind11 buffer protocol. This approach enables seamless Python interop, supports JIT workflows in ROOT, and allows the Awkward Array ecosystem to extend to domains beyond HEP while avoiding platform-specific linking. Overall, the method enhances portability, simplifies integration into diverse projects, and provides a standalone C++ package for advanced data analysis in Python.

Abstract

There are undeniable benefits of binding Python and C++ to take advantage of the best features of both languages. This is especially relevant to the HEP and other scientific communities that have invested heavily in the C++ frameworks and are rapidly moving their data analyses to Python. Version 2 of Awkward Array, a Scikit-HEP Python library, introduces a set of header-only C++ libraries that do not depend on any application binary interface. Users can directly include these libraries in their compilation instead of linking against platform-specific libraries. This new development makes the integration of Awkward Arrays into other projects easier and more portable, as the implementation is easily separable from the rest of the Awkward Array codebase. The code is minimal; it does not include all of the code needed to use Awkward Arrays in Python, nor does it include references to Python or pybind11. The C++ users can use it to make arrays and then copy them to Python without any specialized data types - only raw buffers, strings, and integers. This C++ code also simplifies the process of just-in-time (JIT) compilation in ROOT. This implementation approach solves some of the drawbacks, like packaging projects where native dependencies can be challenging. In this paper, we demonstrate the technique to integrate C++ and Python using a header-only approach. We also describe the implementation of a new LayoutBuilder and a GrowableBuffer. Furthermore, examples of wrapping the C++ data into Awkward Arrays and exposing Awkward Arrays to C++ without copying them are discussed.
Paper Structure (10 sections, 2 figures)

This paper contains 10 sections, 2 figures.

Figures (2)

  • Figure 1: Structure of an Awkward Array with nested variable-length lists and records, color-coded with an array example.
  • Figure 2: Awkward Array GrowableBuffer implemented as a linked list with multiple panels, each of size = 5, that are allocated as needed, i.e., when the GrowableBuffer runs out of space.