Table of Contents
Fetching ...

Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

Xuanqi Liu, Zhuotao Liu, Qi Li, Ke Xu, Mingwei Xu

TL;DR

Pencil is presented, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption.

Abstract

The escalating focus on data privacy poses significant challenges for collaborative neural network training, where data ownership and model training/deployment responsibilities reside with distinct entities. Our community has made substantial contributions to addressing this challenge, proposing various approaches such as federated learning (FL) and privacy-preserving machine learning based on cryptographic constructs like homomorphic encryption (HE) and secure multiparty computation (MPC). However, FL completely overlooks model privacy, and HE has limited extensibility (confined to only one data provider). While the state-of-the-art MPC frameworks provide reasonable throughput and simultaneously ensure model/data privacy, they rely on a critical non-colluding assumption on the computing servers, and relaxing this assumption is still an open problem. In this paper, we present Pencil, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption. Our fundamental design principle is to construct the n-party collaborative training protocol based on an efficient two-party protocol, and meanwhile ensuring that switching to different data providers during model training introduces no extra cost. We introduce several novel cryptographic protocols to realize this design principle and conduct a rigorous security and privacy analysis. Our comprehensive evaluations of Pencil demonstrate that (i) models trained in plaintext and models trained privately using Pencil exhibit nearly identical test accuracies; (ii) The training overhead of Pencil is greatly reduced: Pencil achieves 10 ~ 260x higher throughput and 2 orders of magnitude less communication than prior art; (iii) Pencil is resilient against both existing and adaptive (white-box) attacks.

Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

TL;DR

Pencil is presented, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption.

Abstract

The escalating focus on data privacy poses significant challenges for collaborative neural network training, where data ownership and model training/deployment responsibilities reside with distinct entities. Our community has made substantial contributions to addressing this challenge, proposing various approaches such as federated learning (FL) and privacy-preserving machine learning based on cryptographic constructs like homomorphic encryption (HE) and secure multiparty computation (MPC). However, FL completely overlooks model privacy, and HE has limited extensibility (confined to only one data provider). While the state-of-the-art MPC frameworks provide reasonable throughput and simultaneously ensure model/data privacy, they rely on a critical non-colluding assumption on the computing servers, and relaxing this assumption is still an open problem. In this paper, we present Pencil, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption. Our fundamental design principle is to construct the n-party collaborative training protocol based on an efficient two-party protocol, and meanwhile ensuring that switching to different data providers during model training introduces no extra cost. We introduce several novel cryptographic protocols to realize this design principle and conduct a rigorous security and privacy analysis. Our comprehensive evaluations of Pencil demonstrate that (i) models trained in plaintext and models trained privately using Pencil exhibit nearly identical test accuracies; (ii) The training overhead of Pencil is greatly reduced: Pencil achieves 10 ~ 260x higher throughput and 2 orders of magnitude less communication than prior art; (iii) Pencil is resilient against both existing and adaptive (white-box) attacks.
Paper Structure (57 sections, 3 theorems, 41 equations, 9 figures, 10 tables, 4 algorithms)

This paper contains 57 sections, 3 theorems, 41 equations, 9 figures, 10 tables, 4 algorithms.

Key Result

Theorem 4.1

Assuming the existence of oblivious transfer, homomorphic encryption and secure protocols for non-linearity evaluations, the Pencil framework without the preprocessing optimization is a cryptographic training protocol as defined in Definition def:private-training-protocol.

Figures (9)

  • Figure 1: Test accuracies for trained models. (a) $\sim$ (d) are for models trained from scratch; (e) and (f) are for models trained via transfer learning.
  • Figure 2: Test accuracies of the models trained with different numbers of heterogeneous DOes.
  • Figure 3: Gradient matching attack Zhao2020idlg defended with different levels of noise
  • Figure 4: MLP for the MNIST task Mohassel2018aby3Patra2022blazeAgrawal2019quotient
  • Figure 5: CNN for the MNIST task Riazi2018Chameleon
  • ...and 4 more figures

Theorems & Definitions (5)

  • Definition 4.1
  • Theorem 4.1
  • Definition 4.2
  • Theorem 4.2
  • Corollary 4.1