Learning to Detect Malicious Clients for Robust Federated Learning
Suyi Li, Yong Cheng, Wei Wang, Yang Liu, Tianjian Chen
TL;DR
This work addresses the vulnerability of federated learning to Byzantine and backdoor attacks by introducing a spectral anomaly detection framework that operates at the central server. A variational autoencoder-based detector, trained on public data with dynamic thresholding, identifies and excludes malicious client updates before aggregation, preserving FedAvg-like convergence. Across image classification and sentiment analysis tasks with non-IID data, the method outperforms existing defenses by effectively mitigating both untargeted and targeted attacks, enabling robust FL in realistic settings. The approach offers practical impact by providing targeted defense, leveraging public data, and requiring no attacker knowledge of the detector, with promising directions for further analysis and model enhancement.
Abstract
Federated learning systems are vulnerable to attacks from malicious clients. As the central server in the system cannot govern the behaviors of the clients, a rogue client may initiate an attack by sending malicious model updates to the server, so as to degrade the learning performance or enforce targeted model poisoning attacks (a.k.a. backdoor attacks). Therefore, timely detecting these malicious model updates and the underlying attackers becomes critically important. In this work, we propose a new framework for robust federated learning where the central server learns to detect and remove the malicious model updates using a powerful detection model, leading to targeted defense. We evaluate our solution in both image classification and sentiment analysis tasks with a variety of machine learning models. Experimental results show that our solution ensures robust federated learning that is resilient to both the Byzantine attacks and the targeted model poisoning attacks.
