Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

Yuqi Jia; Minghong Fang; Hongbin Liu; Jinghuai Zhang; Neil Zhenqiang Gong

Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

Yuqi Jia, Minghong Fang, Hongbin Liu, Jinghuai Zhang, Neil Zhenqiang Gong

TL;DR

The paper addresses the vulnerability of federated learning to targeted poisoning attacks that cause misclassification of a chosen target input. It introduces FLForensics, a post-deployment poison-forensics method that traces malicious clients by computing per-client influence scores across stored check points and then clustering two-dimensional scores (s_i, s_i′) using HDBSCAN, aided by a non-target input to disambiguate benign from malicious clients. The authors provide theoretical guarantees under a formal poisoning definition and demonstrate, across five datasets and multiple attack types, that FLForensics can accurately identify malicious clients even when training-phase defenses fail and data are non‑IID. They also show robustness to adaptive attacks and discuss practical recovery steps after detection, including integration with other defense strategies and extension to centralized learning. The work offers a practical, post-deployment tool to improve accountability and resilience in FL systems, with significant implications for security in privacy-preserving collaboration contexts.

Abstract

Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non-iid or the number of malicious clients is large, as confirmed in our experiments. In this work, we propose FLForensics, the first poison-forensics method for FL. FLForensics complements existing training-phase defenses. In particular, when training-phase defenses fail and a poisoned global model is deployed, FLForensics aims to trace back the malicious clients that performed the poisoning attack after a misclassified target input is identified. We theoretically show that FLForensics can accurately distinguish between benign and malicious clients under a formal definition of poisoning attack. Moreover, we empirically show the effectiveness of FLForensics at tracing back both existing and adaptive poisoning attacks on five benchmark datasets.

Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

TL;DR

Abstract

Paper Structure (31 sections, 5 theorems, 21 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 31 sections, 5 theorems, 21 equations, 7 figures, 10 tables, 1 algorithm.

Introduction
Preliminaries and Related Work
Threat Model
Our FLForensics
Overview
Calculating Influence Scores
Detecting Malicious Clients
Experiments
Experimental Setup
Compared methods
Experimental Results
Ablation Studies
Adaptive Attacks
Discussion
Conclusion and Future Work
...and 16 more sections

Key Result

Theorem 1

Suppose the server picks all clients in each check-point training round, i.e., $C_t=\{1,2,\cdots,n\}$ for $t\in \Omega$, and FLForensics uses a true non-target input with target label ${y}$. Based on the poisoning attack definition and Assumption assumption_1_appendix, we have that the influence sco

Figures (7)

Figure 1: Overview of FLForensics. During training, the server stores the intermediate global models and clients' model updates in some training rounds called check points. Given a misclassified target input detected after deploying the poisoned global model, the server uses FLForensics to trace back the malicious clients that performed the poisoning attack.
Figure 2: (a) Influence scores $s_i$ and (b–c) clustering results in one of our experiments using Euclidean and scaled Euclidean distance. Dots represent clients: red (malicious), green (Category I benign), and blue (Category II benign). Different markers represent different HDBSCAN clusters.
Figure 3: Ablation‑study results for FLForensics. Figure \ref{['fig:abl_append']} in the Appendix shows additional studies (e.g., check points, client fraction, and scaling factor).
Figure 4: Results of FLForensics for adaptive attacks.
Figure 5: Triggers in MNIST and ImageNet-Fruits datasets.
...and 2 more figures

Theorems & Definitions (11)

Definition 1: Poisoning Attack to FL
Theorem 1
proof
Theorem 2
proof
Theorem 3
proof
Corollary 1
proof
Corollary 2
...and 1 more

Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

TL;DR

Abstract

Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (11)