BOBA: Byzantine-Robust Federated Learning with Label Skewness

Wenxuan Bao; Jun Wu; Jingrui He

BOBA: Byzantine-Robust Federated Learning with Label Skewness

Wenxuan Bao, Jun Wu, Jingrui He

TL;DR

This paper tackles Byzantine-robust federated learning under label-skewed non‑IID data, where existing AGRs suffer from selection bias and increased vulnerability. It introduces BOBA, a two‑stage aggregator that first learns a robust honest subspace and then identifies honest simplex vertices using server data, discarding Byzantine gradients. The authors provide convergence guarantees and a bounded gradient estimation error that achieves unbiasedness and optimal order robustness, supported by extensive experiments across MNIST, CIFAR‑10, and AG‑News. BOBA demonstrates superior unbiasedness, robustness to diverse attacks, and compatibility with multiple FL frameworks, highlighting its practical impact in realistic non‑IID settings.

Abstract

In federated learning, most existing robust aggregation rules (AGRs) combat Byzantine attacks in the IID setting, where client data is assumed to be independent and identically distributed. In this paper, we address label skewness, a more realistic and challenging non-IID setting, where each client only has access to a few classes of data. In this setting, state-of-the-art AGRs suffer from selection bias, leading to significant performance drop for particular classes; they are also more vulnerable to Byzantine attacks due to the increased variation among gradients of honest clients. To address these limitations, we propose an efficient two-stage method named BOBA. Theoretically, we prove the convergence of BOBA with an error of the optimal order. Our empirical evaluations demonstrate BOBA's superior unbiasedness and robustness across diverse models and datasets when compared to various baselines. Our code is available at https://github.com/baowenxuan/BOBA .

BOBA: Byzantine-Robust Federated Learning with Label Skewness

TL;DR

Abstract

Paper Structure (87 sections, 21 theorems, 119 equations, 14 figures, 9 tables, 2 algorithms)

This paper contains 87 sections, 21 theorems, 119 equations, 14 figures, 9 tables, 2 algorithms.

INTRODUCTION
RELATED WORKS
Robust AGRs with IID clients
Robust AGRs with non-IID clients
FL WITH LABEL SKEWNESS
Setup
Byzantine attack
Distribution of Honest Gradients
Challenges of Label Skewness
Selection bias
Increased vulnerability
PROPOSED BOBA ALGORITHM
Stage 1: Fitting the Honest Subspace
Objective
Optimization
...and 72 more sections

Key Result

Proposition 3.3

With $c$-label skew distribution, $\forall i \in \mathcal{H}$, we have where $\mathbb{E} \boldsymbol{\gamma}_z = \nabla_{\boldsymbol{w}} \sum_{\boldsymbol{\xi}} Q_z(\boldsymbol{\xi})\mathcal{L}(\boldsymbol{w}; \boldsymbol{\xi})$ is the expected gradient computed with data from class $z$.

Figures (14)

Figure 1: PCA of honest gradients on MNIST ($c = 10$). Over 99% of the variance concentrate on the first $(c-1)$ principal components, verifying that honest gradients distribute near the honest subspace.
Figure 2: Comparison of aggregation results. (1) Selection bias: Without attacks, the aggregation results () for GeoMed, Krum and CooMed are biased toward the majority class in the lower-right corner and deviate from the honest gradient center ($\bullet$), indicating their large biases. Meanwhile, BOBA is unbiased. (2) Increased vulnerability: With different attacks, the aggregation results will be different. The orange region represents the heatmap (2D histogram) of possible aggregation results given various attacks, where larger radius indicates worse robustness. BOBA has smallest radius, showing its stronger robustness than IID AGRs.
Figure 3: Running time of AGRs on MNIST
Figure 4: BOBA is robust to corrupted server data
Figure 5: Comparison of trimmed reconstruction loss of BOBA and BOBA-ES
...and 9 more figures

Theorems & Definitions (56)

Definition 3.1: Inner, outer and total variations
Definition 3.2: $c$-label skew distribution
Proposition 3.3: Expectation of honest gradients
Proposition 5.1: Convergence
Proposition 5.4: Lower bound of gradient estimation error for any AGR
Theorem 5.5: Upper bound of gradient estimation error for BOBA
Definition B.1: $L$-smoothness
Definition B.2: $\mu$-strong convexity
Proposition 5.1: Convergence with smooth non-negative loss
proof
...and 46 more

BOBA: Byzantine-Robust Federated Learning with Label Skewness

TL;DR

Abstract

BOBA: Byzantine-Robust Federated Learning with Label Skewness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (56)