Improving Runtime Performance of Tensor Computations using Rust From Python
Kimmie Harding, Daniel M. Dunlavy
TL;DR
The paper tackles the performance bottlenecks of tensor kernels in the Python Tensor Toolbox (pyttb) by re-implementing selected kernels in Rust and exposing them to Python via PyO3. The authors compare Rust-from-Python against Python alone, Numba, and NumPy across kernels of increasing complexity, reporting gains of over $2$ orders of magnitude for vector dot product and dense matrix-vector product, and about $1$ order for sparse tensor times vector (TTV). They analyze overheads from the initial FFI/JIT invocation and consider memory-layout effects between Python (column-major) and Rust (row-major). The results demonstrate the practical viability of Rust extensions for speeding tensor computations on CPUs and provide guidance for future work on memory layouts, more complex kernels like MTTKRP, and potential GPU or concurrent implementations.
Abstract
In this work, we investigate improving the runtime performance of key computational kernels in the Python Tensor Toolbox (pyttb), a package for analyzing tensor data across a wide variety of applications. Recent runtime performance improvements have been demonstrated using Rust, a compiled language, from Python via extension modules leveraging the Python C API -- e.g., web applications, data parsing, data validation, etc. Using this same approach, we study the runtime performance of key tensor kernels of increasing complexity, from simple kernels involving sums of products over data accessed through single and nested loops to more advanced tensor multiplication kernels that are key in low-rank tensor decomposition and tensor regression algorithms. In numerical experiments involving synthetically generated tensor data of various sizes and these tensor kernels, we demonstrate consistent improvements in runtime performance when using Rust from Python over 1) using Python alone, 2) using Python and the Numba just-in-time Python compiler (for loop-based kernels), and 3) using the NumPy Python package for scientific computing (for pyttb kernels).
