Efficient Byzantine-Robust and Provably Privacy-Preserving Federated Learning
Chenfei Nie, Qiang Li, Yuxin Yang, Yuede Ji, Binghui Wang
TL;DR
BPFL tackles the dual challenges of Byzantine attacks and data reconstruction in Federated Learning by integrating a zero-knowledge proof–based robustness check using dual similarity metrics against a server-trained reference model, with privacy preserved through a shared random mask negotiated via Paillier-based homomorphic encryption. The design couples a non-interactive Groth16 ZKP for valid local updates, a Mask Vector Negotiation Protocol for confidentiality, and a hash-based mechanism to prevent forgery, yielding a unified, efficient workflow. Theoretical analysis proves privacy, completeness, and soundness, with favorable per-iteration complexities. Empirical results on multiple datasets show BPFL is robust to various attacks, preserves privacy against model-inversion attempts, and incurs modest overhead compared to MPC/other baselines.
Abstract
Federated learning (FL) is an emerging distributed learning paradigm without sharing participating clients' private data. However, existing works show that FL is vulnerable to both Byzantine (security) attacks and data reconstruction (privacy) attacks. Almost all the existing FL defenses only address one of the two attacks. A few defenses address the two attacks, but they are not efficient and effective enough. We propose BPFL, an efficient Byzantine-robust and provably privacy-preserving FL method that addresses all the issues. Specifically, we draw on state-of-the-art Byzantine-robust FL methods and use similarity metrics to measure the robustness of each participating client in FL. The validity of clients are formulated as circuit constraints on similarity metrics and verified via a zero-knowledge proof. Moreover, the client models are masked by a shared random vector, which is generated based on homomorphic encryption. In doing so, the server receives the masked client models rather than the true ones, which are proven to be private. BPFL is also efficient due to the usage of non-interactive zero-knowledge proof. Experimental results on various datasets show that our BPFL is efficient, Byzantine-robust, and privacy-preserving.
