Table of Contents
Fetching ...

Beyond Latency: A System-Level Characterization of MPC and FHE for PPML

Pengzhi Huang, Kiwan Maeng, G. Edward Suh

Abstract

Privacy protection has become an increasing concern in modern machine learning applications. Privacy-preserving machine learning (PPML) has attracted growing research attention, with approaches such as secure multiparty computation (MPC) and fully homomorphic encryption (FHE) being actively explored. However, existing evaluations of these approaches have frequently been done on a narrow, fragmented setup and only focused on a specific performance metric, such as the online inference latency of a specific batch size. From the existing reports, it is hard to compare different approaches, especially when considering other metrics like energy/cost or broader system setups (various hyperparameters, offline overheads, future hardware/network configurations, etc.). We present a unified characterization of three popular approaches -- two variants of MPC based on arithmetic/binary sharing conversion and function secret sharing, and FHE -- on their performance and cost in performing privacy-preserving inference on multiple CNN and Transformer models. We study a range of LAN and WAN environments, model sizes, batch sizes, and input sequence lengths. We evaluate not only the performance but also the energy consumption and monetary cost of deploying under a realistic scenario, taking into account their offline and online computation/communication overheads. We provide empirical guidance for selecting, optimizing, and deploying these privacy-preserving compute paradigms, and outline how evolving hardware and network trends are likely to shift trade-offs between the two MPC schemes and FHE. This work provides system-level insights for researchers and practitioners who seek to understand or accelerate PPML workloads.

Beyond Latency: A System-Level Characterization of MPC and FHE for PPML

Abstract

Privacy protection has become an increasing concern in modern machine learning applications. Privacy-preserving machine learning (PPML) has attracted growing research attention, with approaches such as secure multiparty computation (MPC) and fully homomorphic encryption (FHE) being actively explored. However, existing evaluations of these approaches have frequently been done on a narrow, fragmented setup and only focused on a specific performance metric, such as the online inference latency of a specific batch size. From the existing reports, it is hard to compare different approaches, especially when considering other metrics like energy/cost or broader system setups (various hyperparameters, offline overheads, future hardware/network configurations, etc.). We present a unified characterization of three popular approaches -- two variants of MPC based on arithmetic/binary sharing conversion and function secret sharing, and FHE -- on their performance and cost in performing privacy-preserving inference on multiple CNN and Transformer models. We study a range of LAN and WAN environments, model sizes, batch sizes, and input sequence lengths. We evaluate not only the performance but also the energy consumption and monetary cost of deploying under a realistic scenario, taking into account their offline and online computation/communication overheads. We provide empirical guidance for selecting, optimizing, and deploying these privacy-preserving compute paradigms, and outline how evolving hardware and network trends are likely to shift trade-offs between the two MPC schemes and FHE. This work provides system-level insights for researchers and practitioners who seek to understand or accelerate PPML workloads.

Paper Structure

This paper contains 16 sections, 1 equation, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Throughput (normalized to batch size 1) of MPC$_\text{FSS}$ and MPC$_\textsc{A2B}$ with different batch sizes. WAN$_\text{F}$ assumed.
  • Figure 2: The MPC$_\text{FSS}$ execution latency of the following task changes under different local key pool size. 200 jobs in WAN$_\text{S}$, each performing 128-batch inference on ResNet-20 (requires a total of around 3.8TB keys, note the performance change around the number), arriving at the server following a Poisson distribution with an average inter-arrival time of 10 seconds.
  • Figure 3: The proportion of total online+offline execution time of MPC$_\text{FSS}$ and MPC$_\textsc{A2B}$ schemes under different networks for batch size 128. $\ast$ denotes estimated results.
  • Figure 4: The total monetary cost of each inference or token generation of MPC$_\text{FSS}$ and MPC$_\textsc{A2B}$ schemes and FHE under different settings, with batch size=128. The lengths of the bars are normalized by the largest value among the three approaches. $\ast$ denotes estimated results.
  • Figure 5: Total energy cost per token generated using the MPC$_\text{FSS}$ and MPC$_\textsc{A2B}$ schemes and FHE across different model/network settings (above, normalized by the largest value among the three approaches), along with the proportion of energy cost contributed by each method (below), with batch size=128. $\ast$ denotes estimated results.
  • ...and 3 more figures