Kick Bad Guys Out! Conditionally Activated Anomaly Detection in Federated Learning with Zero-Knowledge Proof Verification
Shanshan Han, Wenxuan Wu, Baturalp Buyukates, Weizhao Jin, Qifan Zhang, Yuhang Yao, Salman Avestimehr, Chaoyang He
TL;DR
This work tackles the vulnerability of Federated Learning to adversarial clients by introducing RedJasper, a two-stage anomaly detector that activates only when attacks are suspected and couples this with Zero-Knowledge Proofs to verify server-side operations. The first stage uses cross-round similarity checks to gate a second stage that measures an evilness score per local model with an L2 distance and applies a $3\sigma$-based cutoff to remove malicious updates. Key contributions include practical on-demand defense without requiring unrealistic prior knowledge, conditional activation that preserves benign accuracy, a statistically principled and verifiable removal mechanism, and cryptographic verifiability via ZKPs. Experimental results across CV and NLP tasks show robust performance under diverse attack scenarios, with ZKP verification demonstrating feasible overhead. Overall, RedJasper narrows the gap between theoretical FL security and real-world deployment by delivering trustworthy, transparent defense in heterogeneous environments.
Abstract
Federated Learning (FL) systems are susceptible to adversarial attacks, such as model poisoning attacks and backdoor attacks. Existing defense mechanisms face critical limitations in real-world deployments, such as relying on impractical assumptions (e.g., adversaries acknowledging the presence of attacks before attacking) or undermining accuracy in model training, even in benign scenarios. To address these challenges, we propose RedJasper, a two-staged anomaly detection method specifically designed for real-world FL deployments. It identifies suspicious activities in the first stage, then activates the second stage conditionally to further scrutinize the suspicious local models, employing the 3σ rule to identify real malicious local models and filtering them out from FL training. To ensure integrity and transparency within the FL system, RedJasper integrates zero-knowledge proofs, enabling clients to cryptographically verify the server's detection process without relying on the server's goodwill. RedJasper operates without unrealistic assumptions and avoids interfering with FL training in attack-free scenarios. It bridges the gap between theoretical advances in FL security and the practical demands of real-world deployment. Experimental results demonstrate that RedJasper consistently delivers performance comparable to benign cases, highlighting its effectiveness in identifying potential attacks and eliminating malicious models with high accuracy.
