Table of Contents
Fetching ...

Asynchronous Byzantine Federated Learning

Bart Cox, Abele Mălan, Lydia Y. Chen, Jérémie Decouchant

TL;DR

Catalyst addresses the challenge of robust asynchronous Federated Learning in the presence of Byzantine clients by applying clustering-based filtering to the earliest updates and then incorporating late updates to preserve liveness. It operates without requiring a server-held auxiliary dataset and guarantees progress with at least $2f+1$ participating clients, leveraging both fast and slow clients through a carefully designed update aggregation that accounts for staleness. Empirical results across MNIST, CIFAR-10, and WikiText-2 show Catalyst converges faster and achieves higher accuracy under gradient perturbation, gradient inversion, and backdoor attacks than state-of-the-art baselines like FedAsync, Kardam, and BASGD. The approach demonstrates strong resilience to Byzantine behavior, maintains competitive benign performance, and scales well with increasing client counts and varying Byzantine fractions, suggesting practical applicability for real-world asynchronous FL deployments.

Abstract

Federated learning (FL) enables a set of geographically distributed clients to collectively train a model through a server. Classically, the training process is synchronous, but can be made asynchronous to maintain its speed in presence of slow clients and in heterogeneous networks. The vast majority of Byzantine fault-tolerant FL systems however rely on a synchronous training process. Our solution is one of the first Byzantine-resilient and asynchronous FL algorithms that does not require an auxiliary server dataset and is not delayed by stragglers, which are shortcomings of previous works. Intuitively, the server in our solution waits to receive a minimum number of updates from clients on its latest model to safely update it, and is later able to safely leverage the updates that late clients might send. We compare the performance of our solution with state-of-the-art algorithms on both image and text datasets under gradient inversion, perturbation, and backdoor attacks. Our results indicate that our solution trains a model faster than previous synchronous FL solution, and maintains a higher accuracy, up to 1.54x and up to 1.75x for perturbation and gradient inversion attacks respectively, in the presence of Byzantine clients than previous asynchronous FL solutions.

Asynchronous Byzantine Federated Learning

TL;DR

Catalyst addresses the challenge of robust asynchronous Federated Learning in the presence of Byzantine clients by applying clustering-based filtering to the earliest updates and then incorporating late updates to preserve liveness. It operates without requiring a server-held auxiliary dataset and guarantees progress with at least participating clients, leveraging both fast and slow clients through a carefully designed update aggregation that accounts for staleness. Empirical results across MNIST, CIFAR-10, and WikiText-2 show Catalyst converges faster and achieves higher accuracy under gradient perturbation, gradient inversion, and backdoor attacks than state-of-the-art baselines like FedAsync, Kardam, and BASGD. The approach demonstrates strong resilience to Byzantine behavior, maintains competitive benign performance, and scales well with increasing client counts and varying Byzantine fractions, suggesting practical applicability for real-world asynchronous FL deployments.

Abstract

Federated learning (FL) enables a set of geographically distributed clients to collectively train a model through a server. Classically, the training process is synchronous, but can be made asynchronous to maintain its speed in presence of slow clients and in heterogeneous networks. The vast majority of Byzantine fault-tolerant FL systems however rely on a synchronous training process. Our solution is one of the first Byzantine-resilient and asynchronous FL algorithms that does not require an auxiliary server dataset and is not delayed by stragglers, which are shortcomings of previous works. Intuitively, the server in our solution waits to receive a minimum number of updates from clients on its latest model to safely update it, and is later able to safely leverage the updates that late clients might send. We compare the performance of our solution with state-of-the-art algorithms on both image and text datasets under gradient inversion, perturbation, and backdoor attacks. Our results indicate that our solution trains a model faster than previous synchronous FL solution, and maintains a higher accuracy, up to 1.54x and up to 1.75x for perturbation and gradient inversion attacks respectively, in the presence of Byzantine clients than previous asynchronous FL solutions.
Paper Structure (35 sections, 4 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 35 sections, 4 equations, 8 figures, 3 tables, 2 algorithms.

Figures (8)

  • Figure 1: Wall clock time depending on the standard deviation of the client computing power distribution. Each line represent a distributed with modified standard deviation, e.g., the 2.0x line corresponds to a distribution whose standard deviation is twice as high as the one used with the 1.0x line. Higher client diversity slows down convergence.
  • Figure 2: Asynchronous FL requires less time to converge than synchronous FL. Dataset used is MNIST, number of clients is 40.
  • Figure 3: Catalyst and Flame with asynchronous clients.
  • Figure 4: Accuracy of the Catalyst, Kardam, BASGD, and FedAsync defenses with the MNIST dataset without attack and with the Gradient Inversion attack.
  • Figure 5: Accuracy of the Catalyst, Kardam, BASGD, and FedAsync defenses with the Cifar-10 dataset without attack and with the Gradient Inversion attack.
  • ...and 3 more figures