Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality
Hao Li, Gopi Krishnan Rajbahadur, Cor-Paul Bezemer
TL;DR
This work investigates how language bindings for the ML frameworks TensorFlow and PyTorch affect ML software quality. It systematically evaluates correctness and time cost when training and inferring across bindings in C#, Rust, Python, and JavaScript using five models. The study finds that cross-binding inference preserves accuracy and that non-default bindings can reduce training or inference time in certain tasks, though training curves may differ. The replication package enables reproduction and benchmarking for developers and researchers.
Abstract
Bindings for machine learning frameworks (such as TensorFlow and PyTorch) allow developers to integrate a framework's functionality using a programming language different from the framework's default language (usually Python). In this paper, we study the impact of using TensorFlow and PyTorch bindings in C#, Rust, Python and JavaScript on the software quality in terms of correctness (training and test accuracy) and time cost (training and inference time) when training and performing inference on five widely used deep learning models. Our experiments show that a model can be trained in one binding and used for inference in another binding for the same framework without losing accuracy. Our study is the first to show that using a non-default binding can help improve machine learning software quality from the time cost perspective compared to the default Python binding while still achieving the same level of correctness.
