Poisoning Attacks and Defenses to Federated Unlearning
Wenbin Wang, Qiwen Ma, Zifan Zhang, Yuchen Liu, Zhuqing Liu, Minghong Fang
TL;DR
This paper identifies a security gap in federated unlearning by introducing BadUnlearn, a poisoning attack designed to keep the unlearned model poisoned during FU. To counter this, it proposes UnlearnGuard, a robust FU framework that estimates client updates from FL history and validates them via distance-based and direction-aware calibrations (UnlearnGuard-Dist and UnlearnGuard-Dir), supported by a theoretical bound showing the unlearned model can closely approximate a train-from-scratch model. The approach combines Hessian-vector approximations with L-BFGS and historical update analysis to filter unreliable updates, achieving strong empirical resilience against diverse attacks across multiple aggregation rules. The work demonstrates that BadUnlearn can compromise standard FU methods, while UnlearnGuard maintains robustness and compatibility with existing FL pipelines, offering a practical path toward secure, reliable federated unlearning in real-world deployments.
Abstract
Federated learning allows multiple clients to collaboratively train a global model with the assistance of a server. However, its distributed nature makes it susceptible to poisoning attacks, where malicious clients can compromise the global model by sending harmful local model updates to the server. To unlearn an accurate global model from a poisoned one after identifying malicious clients, federated unlearning has been introduced. Yet, current research on federated unlearning has primarily concentrated on its effectiveness and efficiency, overlooking the security challenges it presents. In this work, we bridge the gap via proposing BadUnlearn, the first poisoning attacks targeting federated unlearning. In BadUnlearn, malicious clients send specifically designed local model updates to the server during the unlearning process, aiming to ensure that the resulting unlearned model remains poisoned. To mitigate these threats, we propose UnlearnGuard, a robust federated unlearning framework that is provably robust against both existing poisoning attacks and our BadUnlearn. The core concept of UnlearnGuard is for the server to estimate the clients' local model updates during the unlearning process and employ a filtering strategy to verify the accuracy of these estimations. Theoretically, we prove that the model unlearned through UnlearnGuard closely resembles one obtained by train-from-scratch. Empirically, we show that BadUnlearn can effectively corrupt existing federated unlearning methods, while UnlearnGuard remains secure against poisoning attacks.
