Table of Contents
Fetching ...

Learning to Detect Malicious Clients for Robust Federated Learning

Suyi Li, Yong Cheng, Wei Wang, Yang Liu, Tianjian Chen

TL;DR

This work addresses the vulnerability of federated learning to Byzantine and backdoor attacks by introducing a spectral anomaly detection framework that operates at the central server. A variational autoencoder-based detector, trained on public data with dynamic thresholding, identifies and excludes malicious client updates before aggregation, preserving FedAvg-like convergence. Across image classification and sentiment analysis tasks with non-IID data, the method outperforms existing defenses by effectively mitigating both untargeted and targeted attacks, enabling robust FL in realistic settings. The approach offers practical impact by providing targeted defense, leveraging public data, and requiring no attacker knowledge of the detector, with promising directions for further analysis and model enhancement.

Abstract

Federated learning systems are vulnerable to attacks from malicious clients. As the central server in the system cannot govern the behaviors of the clients, a rogue client may initiate an attack by sending malicious model updates to the server, so as to degrade the learning performance or enforce targeted model poisoning attacks (a.k.a. backdoor attacks). Therefore, timely detecting these malicious model updates and the underlying attackers becomes critically important. In this work, we propose a new framework for robust federated learning where the central server learns to detect and remove the malicious model updates using a powerful detection model, leading to targeted defense. We evaluate our solution in both image classification and sentiment analysis tasks with a variety of machine learning models. Experimental results show that our solution ensures robust federated learning that is resilient to both the Byzantine attacks and the targeted model poisoning attacks.

Learning to Detect Malicious Clients for Robust Federated Learning

TL;DR

This work addresses the vulnerability of federated learning to Byzantine and backdoor attacks by introducing a spectral anomaly detection framework that operates at the central server. A variational autoencoder-based detector, trained on public data with dynamic thresholding, identifies and excludes malicious client updates before aggregation, preserving FedAvg-like convergence. Across image classification and sentiment analysis tasks with non-IID data, the method outperforms existing defenses by effectively mitigating both untargeted and targeted attacks, enabling robust FL in realistic settings. The approach offers practical impact by providing targeted defense, leveraging public data, and requiring no attacker knowledge of the detector, with promising directions for further analysis and model enhancement.

Abstract

Federated learning systems are vulnerable to attacks from malicious clients. As the central server in the system cannot govern the behaviors of the clients, a rogue client may initiate an attack by sending malicious model updates to the server, so as to degrade the learning performance or enforce targeted model poisoning attacks (a.k.a. backdoor attacks). Therefore, timely detecting these malicious model updates and the underlying attackers becomes critically important. In this work, we propose a new framework for robust federated learning where the central server learns to detect and remove the malicious model updates using a powerful detection model, leading to targeted defense. We evaluate our solution in both image classification and sentiment analysis tasks with a variety of machine learning models. Experimental results show that our solution ensures robust federated learning that is resilient to both the Byzantine attacks and the targeted model poisoning attacks.

Paper Structure

This paper contains 17 sections, 1 theorem, 1 equation, 5 figures, 1 table.

Key Result

Theorem 1

Let $f_a$ be the fraction of the total weights attributed to the malicious clients, where $0\le f_a \le 1$. We have

Figures (5)

  • Figure 1: LR model accuracy. Curves in the figure correspond to different sum of weights attributed to malicious attackers.
  • Figure 2: 2D visualization in latent vector space. Green "Centralized" points are unbiased model updates. Blue "Benign" points are biased model updates from benign clients. Red "Malicious" points are malicious model updates from malicious clients. The attack of malicious clients in the left figure is the additive noise attack over the MNIST dataset. The attack of malicious clients in the right figure is the sign-flipping attack over the FEMNIST dataset.
  • Figure 3: Comparison of the benchmark schemes and ours. The figures in the first row show the results of the CNN model on the FEMNIST dataset. The figures in the second row show the results of the LR model on the MNIST dataset. The figures in the third row show the results of the RNN model on the Sentiment140 dataset. The figures in the first two columns correspond to additive noise attack with $30\%$ and $50\%$ attackers, respectively. The figures in the last two columns correspond to sign-flipping attack with $30\%$ and $50\%$ attackers, respectively.
  • Figure 4: Results under backdoor attacks on different datasets.
  • Figure 5: An example of inserted backdoor text "I ate a sandwich".

Theorems & Definitions (1)

  • Theorem 1