Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

Hao Li; Gopi Krishnan Rajbahadur; Cor-Paul Bezemer

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

Hao Li, Gopi Krishnan Rajbahadur, Cor-Paul Bezemer

TL;DR

This work investigates how language bindings for the ML frameworks TensorFlow and PyTorch affect ML software quality. It systematically evaluates correctness and time cost when training and inferring across bindings in C#, Rust, Python, and JavaScript using five models. The study finds that cross-binding inference preserves accuracy and that non-default bindings can reduce training or inference time in certain tasks, though training curves may differ. The replication package enables reproduction and benchmarking for developers and researchers.

Abstract

Bindings for machine learning frameworks (such as TensorFlow and PyTorch) allow developers to integrate a framework's functionality using a programming language different from the framework's default language (usually Python). In this paper, we study the impact of using TensorFlow and PyTorch bindings in C#, Rust, Python and JavaScript on the software quality in terms of correctness (training and test accuracy) and time cost (training and inference time) when training and performing inference on five widely used deep learning models. Our experiments show that a model can be trained in one binding and used for inference in another binding for the same framework without losing accuracy. Our study is the first to show that using a non-default binding can help improve machine learning software quality from the time cost perspective compared to the default Python binding while still achieving the same level of correctness.

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

TL;DR

Abstract

Paper Structure (30 sections, 1 equation, 7 figures, 7 tables, 4 algorithms)

This paper contains 30 sections, 1 equation, 7 figures, 7 tables, 4 algorithms.

Introduction
Background
ML Frameworks
Bindings for the ML frameworks
Study Design
Environment setting
Studied datasets and models
Studied ML frameworks
Studied Bindings
Correctness evaluation
Time cost evaluation
Experimental setup
Supported features in studied bindings
Correctness Evaluation
Time Cost Evaluation
...and 15 more sections

Figures (7)

Figure 1: Bindings use the functionality of ML frameworks via foreign function interfaces (FFIs) to train models and perform model inference.
Figure 2: Overview of the study design.
Figure 3: Mean training accuracy curves of LeNet-1, LeNet-5, VGG-16, LSTM, and GRU on GPU in bindings for TensorFlow (first row) and PyTorch (second row).
Figure 4: All bindings load the trained models that are saved by the default Python bindings for ML frameworks.
Figure 5: Results of reproducing the test accuracy of pre-trained models in TensorFlow and PyTorch bindings on the CPU and GPU (the results are identical). Note: the failed cases in the PyTorch's C# binding were fixed in a newer version of the binding.
...and 2 more figures

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

TL;DR

Abstract

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

Authors

TL;DR

Abstract

Table of Contents

Figures (7)