Table of Contents
Fetching ...

Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization

Afonso de Sá Delgado Neto, Maximilian Egger, Mayank Bakshi, Rawad Bitar

TL;DR

CyBeR-0 addresses Byzantine-resilient federated learning with memory- and communication-efficient zero-order optimization. It compresses a $d$-dimensional gradient into $k$ scalars via a shared seed for perturbation directions, and employs a trimmed-mean robust aggregator to mitigate adversarial updates, achieving convergence guarantees for convex losses under IID data. Empirically, CyBeR-0 matches or closely approaches non-Byzantine accuracy on MNIST and enables substantial communication savings (up to orders of magnitude) while fine-tuning RoBERTa-Large on NLP tasks under Byzantine attacks. The work combines zero-order estimation, communication compression, and Byzantine robustness to enable practical, robust federated learning in resource-constrained environments.

Abstract

We introduce CYBER-0, the first zero-order optimization algorithm for memory-and-communication efficient Federated Learning, resilient to Byzantine faults. We show through extensive numerical experiments on the MNIST dataset and finetuning RoBERTa-Large that CYBER-0 outperforms state-of-the-art algorithms in terms of communication and memory efficiency while reaching similar accuracy. We provide theoretical guarantees on its convergence for convex loss functions.

Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization

TL;DR

CyBeR-0 addresses Byzantine-resilient federated learning with memory- and communication-efficient zero-order optimization. It compresses a -dimensional gradient into scalars via a shared seed for perturbation directions, and employs a trimmed-mean robust aggregator to mitigate adversarial updates, achieving convergence guarantees for convex losses under IID data. Empirically, CyBeR-0 matches or closely approaches non-Byzantine accuracy on MNIST and enables substantial communication savings (up to orders of magnitude) while fine-tuning RoBERTa-Large on NLP tasks under Byzantine attacks. The work combines zero-order estimation, communication compression, and Byzantine robustness to enable practical, robust federated learning in resource-constrained environments.

Abstract

We introduce CYBER-0, the first zero-order optimization algorithm for memory-and-communication efficient Federated Learning, resilient to Byzantine faults. We show through extensive numerical experiments on the MNIST dataset and finetuning RoBERTa-Large that CYBER-0 outperforms state-of-the-art algorithms in terms of communication and memory efficiency while reaching similar accuracy. We provide theoretical guarantees on its convergence for convex loss functions.
Paper Structure (25 sections, 19 theorems, 57 equations, 5 figures, 6 tables, 2 algorithms)

This paper contains 25 sections, 19 theorems, 57 equations, 5 figures, 6 tables, 2 algorithms.

Key Result

Theorem 5.8

Let $\mu \ge 0$, ${\boldsymbol z} \in \mathbb{S}^d, \epsilon>0$. Then for any ${\boldsymbol z}_r \in \mathbb{S}^d$ for $r \in [k]$, under Assumptions assump:smooth, assump:subexp, $\alpha \le \beta < \frac{1}{2} - \epsilon$, and with probability at least $1-\frac{4}{(1+2nm\hat{L}_\mu)^d(1+nm\hat{L}_ for

Figures (5)

  • Figure 1: CyBeR-0 for logistic regression on MNIST under non-IID data distribution. Figures (a) and (b) show the convergence for varying $k$ in the absence of Byzantine clients compared to federated averaging (FedAvg). Figure (c) shows different attacks for $k=64$ and $\alpha=\beta=0.25$.
  • Figure 2: Effect of Byzantine clients on the convergence speed of CyBeR-0.
  • Figure 3: CyBeR-0 - Experimental Setup
  • Figure 4: Contrasting Byzantine and Non-Byzantine Scenarios Across Diverse Data Distributions with RoBERTa-large on TREC and SNLI: This figure compares the performance of CyBeR-0 using a RoBERTa-large model on both TREC and SNLI datasets. Non-Byzantine behavior stands for CyBeR-0 with no Byzantine clients nor robust aggregation.
  • Figure 5: CyBeR-0 - Experimental Setup with Local Epochs

Theorems & Definitions (36)

  • Definition 2.1: Attack Model
  • Definition 3.1: Zero-Order estimate
  • Definition 3.2: Trimmed Mean (adopted from yinByzantineRobustDistributedLearning2021)
  • Definition 5.2: Population Loss
  • Definition 5.4: Zero-Order Population Estimate
  • Theorem 5.8: Robustness Error Bound
  • Theorem 5.10: Convergence, $\mu = 0$
  • Definition 5.11: Smoothed Version of $F$
  • Theorem 5.13: Convergence, $\mu> 0$
  • Lemma B.1
  • ...and 26 more