BackFed: An Efficient & Standardized Benchmark Suite for Backdoor Attacks in Federated Learning
Thinh Dao, Dung Thuy Nguyen, Khoa D Doan, Kok-Seng Wong
TL;DR
BackFed addresses the pervasive evaluation gaps in federated-learning backdoor research by providing a standardized, efficient benchmark that unifies attacks and defenses under a realistic FL protocol. It combines a multi-processing Ray-based backend with an extensible modular architecture to enable scalable, fair comparisons across eight datasets and a broad spectrum of attack and defense methods. The empirical results reveal that many attacks require substantial training resources and that several defenses incur notable accuracy degradation or aggregation overhead, with dynamic attacks often evading robust aggregators. The work also offers practical guidelines—such as adopting a lower server learning rate and maintaining a large pool of sampling clients with a low selection threshold—and provides open-source code to foster reliable progress in FL backdoor research.
Abstract
Research on backdoor attacks in Federated Learning (FL) has accelerated in recent years, with new attacks and defenses continually proposed in an escalating arms race. However, the evaluation of these methods remains neither standardized nor reliable. First, there are severe inconsistencies in the evaluation settings across studies, and many rely on unrealistic threat models. Second, our code review uncovers semantic bugs in the official codebases of several attacks that artificially inflate their reported performance. These issues raise fundamental questions about whether current methods are truly effective or simply overfitted to narrow experimental setups. We introduce \textbf{BackFed}, a benchmark designed to standardize and stress-test FL backdoor evaluation by unifying attacks and defenses under a common evaluation framework that mirrors realistic FL deployments. Our benchmark on three representative datasets with three distinct architectures reveals critical limitations of existing methods. Malicious clients often require excessive training time and computation, making them vulnerable to server-enforced time constraints. Meanwhile, several defenses incur severe accuracy degradation or aggregation overhead. Popular defenses and attacks achieve limited performance in our benchmark, which challenges their previous efficacy claims. We establish BackFed as a rigorous and fair evaluation framework that enables more reliable progress in FL backdoor research.
