VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices
Han Zhang, Zifan Wang, Mihir Dhamankar, Matt Fredrikson, Yuvraj Agarwal
TL;DR
The paper tackles the challenge of offloading deep ML inferences from budget IoT devices to locally available devices without compromising user privacy, model secrets, or results integrity. It introduces VeriSplit, a framework that uses linear input masking for data privacy, mask-based parameter confidentiality with two non-colluding workers, and a commitment-based verification scheme built on Merkle trees to separate verification from the inference path and enable asynchronous/partial checks. Unlike prior cryptographic or TEEs-based approaches, VeriSplit preserves floating-point model fidelity and avoids special hardware, achieving practical latency improvements of up to $83\%$ over local execution in diverse tests, including Vision Transformers and CNNs. The work demonstrates that secure, private, and practical cross-device ML offloading is achievable in home IoT settings, enabling cost savings and better utilization of idle local compute resources.
Abstract
Many Internet-of-Things (IoT) devices rely on cloud computation resources to perform machine learning inferences. This is expensive and may raise privacy concerns for users. Consumers of these devices often have hardware such as gaming consoles and PCs with graphics accelerators that are capable of performing these computations, which may be left idle for significant periods of time. While this presents a compelling potential alternative to cloud offloading, concerns about the integrity of inferences, the confidentiality of model parameters, and the privacy of users' data mean that device vendors may be hesitant to offload their inferences to a platform managed by another manufacturer. We propose VeriSplit, a framework for offloading machine learning inferences to locally-available devices that address these concerns. We introduce masking techniques to protect data privacy and model confidentiality, and a commitment-based verification protocol to address integrity. Unlike much prior work aimed at addressing these issues, our approach does not rely on computation over finite field elements, which may interfere with floating-point computation supports on hardware accelerators and require modification to existing models. We implemented a prototype of VeriSplit and our evaluation results show that, compared to performing computation locally, our secure and private offloading solution can reduce inference latency by 28%--83%.
