Table of Contents
Fetching ...

FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

Weizhao Jin, Yuhang Yao, Shanshan Han, Jiajun Gu, Carlee Joe-Wong, Srivatsan Ravi, Salman Avestimehr, Chaoyang He

TL;DR

FedML-HE tackles privacy leakage in federated learning by enabling secure model aggregation via homomorphic encryption, while addressing the prohibitive overhead that has limited HE deployment on large foundation models. The authors introduce Selective Parameter Encryption, which encrypts only the most privacy-sensitive parameters based on a data-driven privacy map, dramatically reducing computation and communication costs. They provide a formal privacy analysis for base and selective protocols, including DP considerations, and demonstrate through extensive experiments that FedML-HE achieves up to ~10x overhead reduction for ResNet-50 and up to ~40x for BERT compared with fully encrypted baselines. The work offers a practical pathway to scalable HE-based FL deployments with encryption-key management, flexible privacy guarantees, and a modular software framework that supports runtime optimization and diverse HE backends.

Abstract

Federated Learning trains machine learning models on distributed devices by aggregating local model updates instead of local data. However, privacy concerns arise as the aggregated local models on the server may reveal sensitive personal information by inversion attacks. Privacy-preserving methods, such as homomorphic encryption (HE), then become necessary for FL training. Despite HE's privacy advantages, its applications suffer from impractical overheads, especially for foundation models. In this paper, we present FedML-HE, the first practical federated learning system with efficient HE-based secure model aggregation. FedML-HE proposes to selectively encrypt sensitive parameters, significantly reducing both computation and communication overheads during training while providing customizable privacy preservation. Our optimized system demonstrates considerable overhead reduction, particularly for large foundation models (e.g., ~10x reduction for ResNet-50, and up to ~40x reduction for BERT), demonstrating the potential for scalable HE-based FL deployment.

FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

TL;DR

FedML-HE tackles privacy leakage in federated learning by enabling secure model aggregation via homomorphic encryption, while addressing the prohibitive overhead that has limited HE deployment on large foundation models. The authors introduce Selective Parameter Encryption, which encrypts only the most privacy-sensitive parameters based on a data-driven privacy map, dramatically reducing computation and communication costs. They provide a formal privacy analysis for base and selective protocols, including DP considerations, and demonstrate through extensive experiments that FedML-HE achieves up to ~10x overhead reduction for ResNet-50 and up to ~40x for BERT compared with fully encrypted baselines. The work offers a practical pathway to scalable HE-based FL deployments with encryption-key management, flexible privacy guarantees, and a modular software framework that supports runtime optimization and diverse HE backends.

Abstract

Federated Learning trains machine learning models on distributed devices by aggregating local model updates instead of local data. However, privacy concerns arise as the aggregated local models on the server may reveal sensitive personal information by inversion attacks. Privacy-preserving methods, such as homomorphic encryption (HE), then become necessary for FL training. Despite HE's privacy advantages, its applications suffer from impractical overheads, especially for foundation models. In this paper, we present FedML-HE, the first practical federated learning system with efficient HE-based secure model aggregation. FedML-HE proposes to selectively encrypt sensitive parameters, significantly reducing both computation and communication overheads during training while providing customizable privacy preservation. Our optimized system demonstrates considerable overhead reduction, particularly for large foundation models (e.g., ~10x reduction for ResNet-50, and up to ~40x reduction for BERT), demonstrating the potential for scalable HE-based FL deployment.
Paper Structure (33 sections, 4 theorems, 8 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 33 sections, 4 theorems, 8 equations, 14 figures, 8 tables, 1 algorithm.

Key Result

Lemma 3.8

To achieve $\epsilon$-differential privacy, we choose the scale parameter $b$ as: With this choice of $b$, the Laplace mechanism $\mathcal{F}$ satisfies $\epsilon$-differential privacy.

Figures (14)

  • Figure 1: Data Reconstruction Attacks: an adversarial server can recover local training data from local model updates.
  • Figure 2: Computational (left) and Computation (right) Overhead Comparison for Models of Different Sizes: Naive FedML-HE vs. Nvidia FLARE vs. Plaintext Aggregation. Due to TenSeal's larger file sizes, FLARE did not finish the run on BERT on our 32GB memory machine.
  • Figure 3: FedML-HE System Pipeline: in the Encryption Key Agreement stage, clients can either use distributed threshold key agreement protocol or outsource a trusted key authority. We simplify the illustration here by abstracting the key pair of the public key and secret key (partial secret keys if using threshold protocol) as one key; in the Encryption Mask Calculation stage, clients use local datasets to calculate local model sensitivity maps which are homomorphically aggregated at the server to generate an encryption mask; in the Encrypted Federated Learning stage, clients use homomorphic encryption with encryption mask to protect local model updates where the server aggregates them but does not have access to sensitive local models.
  • Figure 4: Selective Parameter Encryption: in the initialization stage, clients first calculate privacy sensitivities on the model using its own dataset and local sensitivities will be securely aggregated to a global model privacy map. The encryption mask will be then determined by the privacy map and a set selection value $p$ per overhead requirements and privacy guarantee. Only the masked parameters will be aggregated in the encrypted form.
  • Figure 5: Model Privacy Map Calculated by Sensitivity on LeNet: darker color indicates higher sensitivity. Each subfigure shows the sensitivity of parameters of the current layer. The sensitivity of parameters is imbalanced and many parameters have very little sensitivity (its gradient is hard to be affected by tuning the data input for attack).
  • ...and 9 more figures

Theorems & Definitions (14)

  • Definition 3.1: Single-Key Adversary
  • Definition 3.2: Threshold Adversary
  • Definition 3.3: Privacy
  • Definition 3.4: Adjacent Datasets
  • Definition 3.5: $\epsilon$-Differential Privacy
  • Definition 3.6: Laplace mechanism
  • Definition 3.7: Sensitivity
  • Lemma 3.8: Achieving $\epsilon$-Differential Privacy by Laplace Mechanism dwork2008differentialabadi2016deep
  • Theorem 3.9: Achieving $0$-Differential Privacy by Homomorphic Encryption
  • Lemma 3.10: Sequential Composition dwork2008differential,
  • ...and 4 more