ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain
Zhikai Li, Steve Vott, Bhaskar Krishnamachar
TL;DR
The paper tackles the challenge of trustworthy, verifiable ML inference on blockchains by presenting ML2SC, a PyTorch-to-Solidity translator that compiles off-chain trained MLPs into Solidity using a $59.18$-decimal fixed-point PRBMath to preserve numerical accuracy. It provides a data loader and explicit gas-cost equations for deployment and inference, and demonstrates that on-chain outputs match off-chain results with linear scaling of gas costs relative to architectural size. Key contributions include the translator, the JavaScript data loader, and architecture-aware gas modeling validated by experiments on a Heart Attack dataset. This work offers a practical, open-source pathway to deploy on-chain ML with verifiable results and sets the stage for extending to more complex architectures and other blockchain platforms.
Abstract
With the growing concern of AI safety, there is a need to trust the computations done by machine learning (ML) models. Blockchain technology, known for recording data and running computations transparently and in a tamper-proof manner, can offer this trust. One significant challenge in deploying ML Classifiers on-chain is that while ML models are typically written in Python using an ML library such as Pytorch, smart contracts deployed on EVM-compatible blockchains are written in Solidity. We introduce Machine Learning to Smart Contract (ML2SC), a PyTorch to Solidity translator that can automatically translate multi-layer perceptron (MLP) models written in Pytorch to Solidity smart contract versions. ML2SC uses a fixed-point math library to approximate floating-point computation. After deploying the generated smart contract, we can train our models off-chain using PyTorch and then further transfer the acquired weights and biases to the smart contract using a function call. Finally, the model inference can also be done with a function call providing the input. We mathematically model the gas costs associated with deploying, updating model parameters, and running inference on these models on-chain, showing that the gas costs increase linearly in various parameters associated with an MLP. We present empirical results matching our modeling. We also evaluate the classification accuracy showing that the outputs obtained by our transparent on-chain implementation are identical to the original off-chain implementation with Pytorch.
