Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization

Afonso de Sá Delgado Neto; Maximilian Egger; Mayank Bakshi; Rawad Bitar

Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization

Afonso de Sá Delgado Neto, Maximilian Egger, Mayank Bakshi, Rawad Bitar

TL;DR

CyBeR-0 addresses Byzantine-resilient federated learning with memory- and communication-efficient zero-order optimization. It compresses a $d$-dimensional gradient into $k$ scalars via a shared seed for perturbation directions, and employs a trimmed-mean robust aggregator to mitigate adversarial updates, achieving convergence guarantees for convex losses under IID data. Empirically, CyBeR-0 matches or closely approaches non-Byzantine accuracy on MNIST and enables substantial communication savings (up to orders of magnitude) while fine-tuning RoBERTa-Large on NLP tasks under Byzantine attacks. The work combines zero-order estimation, communication compression, and Byzantine robustness to enable practical, robust federated learning in resource-constrained environments.

Abstract

We introduce CYBER-0, the first zero-order optimization algorithm for memory-and-communication efficient Federated Learning, resilient to Byzantine faults. We show through extensive numerical experiments on the MNIST dataset and finetuning RoBERTa-Large that CYBER-0 outperforms state-of-the-art algorithms in terms of communication and memory efficiency while reaching similar accuracy. We provide theoretical guarantees on its convergence for convex loss functions.

Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization

TL;DR

CyBeR-0 addresses Byzantine-resilient federated learning with memory- and communication-efficient zero-order optimization. It compresses a

-dimensional gradient into

scalars via a shared seed for perturbation directions, and employs a trimmed-mean robust aggregator to mitigate adversarial updates, achieving convergence guarantees for convex losses under IID data. Empirically, CyBeR-0 matches or closely approaches non-Byzantine accuracy on MNIST and enables substantial communication savings (up to orders of magnitude) while fine-tuning RoBERTa-Large on NLP tasks under Byzantine attacks. The work combines zero-order estimation, communication compression, and Byzantine robustness to enable practical, robust federated learning in resource-constrained environments.

Abstract

Paper Structure (25 sections, 19 theorems, 57 equations, 5 figures, 6 tables, 2 algorithms)

This paper contains 25 sections, 19 theorems, 57 equations, 5 figures, 6 tables, 2 algorithms.

Introduction
Problem Setting
Robust Zero-Order Federated Learning
Properties of CyBeR-0
Experiments
Experimental Setup
CyBeR-0 with Logistic Regression on MNIST
Fine-Tuning Language Models with CyBeR-0
Theoretical Analysis
Preliminaries
Robustness Error Bound
Convergence analysis
Related Work
Conclusion
Experiments
...and 10 more sections

Key Result

Theorem 5.8

Let $\mu \ge 0$, ${\boldsymbol z} \in \mathbb{S}^d, \epsilon>0$. Then for any ${\boldsymbol z}_r \in \mathbb{S}^d$ for $r \in [k]$, under Assumptions assump:smooth, assump:subexp, $\alpha \le \beta < \frac{1}{2} - \epsilon$, and with probability at least $1-\frac{4}{(1+2nm\hat{L}_\mu)^d(1+nm\hat{L}_ for

Figures (5)

Figure 1: CyBeR-0 for logistic regression on MNIST under non-IID data distribution. Figures (a) and (b) show the convergence for varying $k$ in the absence of Byzantine clients compared to federated averaging (FedAvg). Figure (c) shows different attacks for $k=64$ and $\alpha=\beta=0.25$.
Figure 2: Effect of Byzantine clients on the convergence speed of CyBeR-0.
Figure 3: CyBeR-0 - Experimental Setup
Figure 4: Contrasting Byzantine and Non-Byzantine Scenarios Across Diverse Data Distributions with RoBERTa-large on TREC and SNLI: This figure compares the performance of CyBeR-0 using a RoBERTa-large model on both TREC and SNLI datasets. Non-Byzantine behavior stands for CyBeR-0 with no Byzantine clients nor robust aggregation.
Figure 5: CyBeR-0 - Experimental Setup with Local Epochs

Theorems & Definitions (36)

Definition 2.1: Attack Model
Definition 3.1: Zero-Order estimate
Definition 3.2: Trimmed Mean (adopted from yinByzantineRobustDistributedLearning2021)
Definition 5.2: Population Loss
Definition 5.4: Zero-Order Population Estimate
Theorem 5.8: Robustness Error Bound
Theorem 5.10: Convergence, $\mu = 0$
Definition 5.11: Smoothed Version of $F$
Theorem 5.13: Convergence, $\mu> 0$
Lemma B.1
...and 26 more

Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization

TL;DR

Abstract

Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (36)