Table of Contents
Fetching ...

Efficient Byzantine-Robust and Provably Privacy-Preserving Federated Learning

Chenfei Nie, Qiang Li, Yuxin Yang, Yuede Ji, Binghui Wang

TL;DR

BPFL tackles the dual challenges of Byzantine attacks and data reconstruction in Federated Learning by integrating a zero-knowledge proof–based robustness check using dual similarity metrics against a server-trained reference model, with privacy preserved through a shared random mask negotiated via Paillier-based homomorphic encryption. The design couples a non-interactive Groth16 ZKP for valid local updates, a Mask Vector Negotiation Protocol for confidentiality, and a hash-based mechanism to prevent forgery, yielding a unified, efficient workflow. Theoretical analysis proves privacy, completeness, and soundness, with favorable per-iteration complexities. Empirical results on multiple datasets show BPFL is robust to various attacks, preserves privacy against model-inversion attempts, and incurs modest overhead compared to MPC/other baselines.

Abstract

Federated learning (FL) is an emerging distributed learning paradigm without sharing participating clients' private data. However, existing works show that FL is vulnerable to both Byzantine (security) attacks and data reconstruction (privacy) attacks. Almost all the existing FL defenses only address one of the two attacks. A few defenses address the two attacks, but they are not efficient and effective enough. We propose BPFL, an efficient Byzantine-robust and provably privacy-preserving FL method that addresses all the issues. Specifically, we draw on state-of-the-art Byzantine-robust FL methods and use similarity metrics to measure the robustness of each participating client in FL. The validity of clients are formulated as circuit constraints on similarity metrics and verified via a zero-knowledge proof. Moreover, the client models are masked by a shared random vector, which is generated based on homomorphic encryption. In doing so, the server receives the masked client models rather than the true ones, which are proven to be private. BPFL is also efficient due to the usage of non-interactive zero-knowledge proof. Experimental results on various datasets show that our BPFL is efficient, Byzantine-robust, and privacy-preserving.

Efficient Byzantine-Robust and Provably Privacy-Preserving Federated Learning

TL;DR

BPFL tackles the dual challenges of Byzantine attacks and data reconstruction in Federated Learning by integrating a zero-knowledge proof–based robustness check using dual similarity metrics against a server-trained reference model, with privacy preserved through a shared random mask negotiated via Paillier-based homomorphic encryption. The design couples a non-interactive Groth16 ZKP for valid local updates, a Mask Vector Negotiation Protocol for confidentiality, and a hash-based mechanism to prevent forgery, yielding a unified, efficient workflow. Theoretical analysis proves privacy, completeness, and soundness, with favorable per-iteration complexities. Empirical results on multiple datasets show BPFL is robust to various attacks, preserves privacy against model-inversion attempts, and incurs modest overhead compared to MPC/other baselines.

Abstract

Federated learning (FL) is an emerging distributed learning paradigm without sharing participating clients' private data. However, existing works show that FL is vulnerable to both Byzantine (security) attacks and data reconstruction (privacy) attacks. Almost all the existing FL defenses only address one of the two attacks. A few defenses address the two attacks, but they are not efficient and effective enough. We propose BPFL, an efficient Byzantine-robust and provably privacy-preserving FL method that addresses all the issues. Specifically, we draw on state-of-the-art Byzantine-robust FL methods and use similarity metrics to measure the robustness of each participating client in FL. The validity of clients are formulated as circuit constraints on similarity metrics and verified via a zero-knowledge proof. Moreover, the client models are masked by a shared random vector, which is generated based on homomorphic encryption. In doing so, the server receives the masked client models rather than the true ones, which are proven to be private. BPFL is also efficient due to the usage of non-interactive zero-knowledge proof. Experimental results on various datasets show that our BPFL is efficient, Byzantine-robust, and privacy-preserving.
Paper Structure (13 sections, 6 theorems, 4 equations, 20 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 6 theorems, 4 equations, 20 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

BPFL is privacy-preserving, i.e., the honest-but-curious server can not infer clients’ private data if Paillier homomorphic encryption used in Algorithm alg:1 is semantically secure and Gorth16 is zero-knowledge.

Figures (20)

  • Figure 1: An honest-but-curious server recovers the raw data from the shared client models trained by FLTrust.
  • Figure 2: Overview of BPFL.
  • Figure 3: Motivation of using both the cosine similarity and Euclidean distance to detect malicious model updates. $\boldsymbol G$ is the real update, $\boldsymbol A$ and $\boldsymbol B$ are valid updates, while $\boldsymbol C$ and $\boldsymbol D$ are malicious updates.
  • Figure 4: The whole procedure of BPFL.
  • Figure 5: The complete overhead analysis of BPFL.
  • ...and 15 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Definition 1
  • Theorem 3
  • proof
  • Theorem 3
  • proof
  • Theorem 3
  • proof