MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Yongqin Wang; Rachit Rajat; Murali Annavaram

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Yongqin Wang, Rachit Rajat, Murali Annavaram

TL;DR

MPC-Pipe tackles the overheads of secure MPC for ML by identifying and removing unnecessary serialization between computation and communication in secret-sharing MPC. It introduces three pipeline schemes—inter-linear, inner-layer, and inter-batch—to overlap online-phase operations during inference and training, implemented on a modified CrypTen with separated offline/online phases. Empirical evaluation on CNNs and Transformers shows substantial end-to-end gains in throughput (up to ~50%) and latency reductions (up to ~16%) across LAN and WAN settings, with additional improvements in resource utilization and scalability to more parties. The approach preserves accuracy, requires no changes to MPC protocols, and leverages domain-specific data-flow characteristics to achieve practical, end-to-end improvements for privacy-preserving ML workloads.

Abstract

Multi-party computing (MPC) has been gaining popularity as a secure computing model over the past few years. However, prior works have demonstrated that MPC protocols still pay substantial performance penalties compared to plaintext, particularly when applied to ML algorithms. The overhead is due to added computation and communication costs. Prior studies, as well as our own analysis, found that most MPC protocols today sequentially perform communication and computation. The participating parties must compute on their shares first and then perform data communication to allow the distribution of new secret shares before proceeding to the next computation step. In this work, we show that serialization is unnecessary, particularly in the context of ML computations (both in Convolutional neural networks and in Transformer-based models). We demonstrate that it is possible to carefully orchestrate the computation and communication steps to overlap. We propose MPC-Pipe, an efficient MPC system for both training and inference of ML workloads, which pipelines computations and communications in an MPC protocol during the online phase. MPC-Pipe proposes three pipeline schemes to optimize the online phase of ML in the semi-honest majority adversary setting. We implement MPC-Pipe by augmenting a modified version of CrypTen, which separates online and offline phases. We evaluate the end-to-end system performance benefits of the online phase of MPC using deep neural networks (VGG16, ResNet50) and Transformers using different network settings. We show that MPC-Pipe can improve the throughput and latency of ML workloads.

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

TL;DR

Abstract

Paper Structure (31 sections, 3 equations, 10 figures, 6 tables, 4 algorithms)

This paper contains 31 sections, 3 equations, 10 figures, 6 tables, 4 algorithms.

Introduction
Key observations about MPC frameworks
Our Contribution
Related Works
MPC Operation Optimizations
Other Privacy Preserving Mechanism
Background
Secret Sharing
Beaver Triple Assisted Operations
The number of parties
MPC-Pipe
Inter-linear pipeline for inference and training
Inner-layer pipeline
Inter-batch pipeline
Impact on latency and throughput
...and 16 more sections

Figures (10)

Figure 1: MPC ML model communication and computation decomposition; the GPU icon represents computation runtime, and the router icon represents communication runtime.
Figure 2: Inter-linear pipeline demonstration; the box with a GPU icon is the time spent for computation; the box with a router icon is the time spent transmitting data.
Figure 3: Inner-layer pipeline demonstration; the box with a GPU icon is the time spent for computation; the box with a router icon is the time spent transmitting data.
Figure 4: Latency improvement for non-linear layers with different numbers of tiles.
Figure 5: Inner-batch pipeline demonstration; the box with a GPU indicates that operation is dominated by computation; the box with a router icon indicates that operation is dominated by communication.
...and 5 more figures

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

TL;DR

Abstract

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (10)