Table of Contents
Fetching ...

BOBA: Byzantine-Robust Federated Learning with Label Skewness

Wenxuan Bao, Jun Wu, Jingrui He

TL;DR

This paper tackles Byzantine-robust federated learning under label-skewed non‑IID data, where existing AGRs suffer from selection bias and increased vulnerability. It introduces BOBA, a two‑stage aggregator that first learns a robust honest subspace and then identifies honest simplex vertices using server data, discarding Byzantine gradients. The authors provide convergence guarantees and a bounded gradient estimation error that achieves unbiasedness and optimal order robustness, supported by extensive experiments across MNIST, CIFAR‑10, and AG‑News. BOBA demonstrates superior unbiasedness, robustness to diverse attacks, and compatibility with multiple FL frameworks, highlighting its practical impact in realistic non‑IID settings.

Abstract

In federated learning, most existing robust aggregation rules (AGRs) combat Byzantine attacks in the IID setting, where client data is assumed to be independent and identically distributed. In this paper, we address label skewness, a more realistic and challenging non-IID setting, where each client only has access to a few classes of data. In this setting, state-of-the-art AGRs suffer from selection bias, leading to significant performance drop for particular classes; they are also more vulnerable to Byzantine attacks due to the increased variation among gradients of honest clients. To address these limitations, we propose an efficient two-stage method named BOBA. Theoretically, we prove the convergence of BOBA with an error of the optimal order. Our empirical evaluations demonstrate BOBA's superior unbiasedness and robustness across diverse models and datasets when compared to various baselines. Our code is available at https://github.com/baowenxuan/BOBA .

BOBA: Byzantine-Robust Federated Learning with Label Skewness

TL;DR

This paper tackles Byzantine-robust federated learning under label-skewed non‑IID data, where existing AGRs suffer from selection bias and increased vulnerability. It introduces BOBA, a two‑stage aggregator that first learns a robust honest subspace and then identifies honest simplex vertices using server data, discarding Byzantine gradients. The authors provide convergence guarantees and a bounded gradient estimation error that achieves unbiasedness and optimal order robustness, supported by extensive experiments across MNIST, CIFAR‑10, and AG‑News. BOBA demonstrates superior unbiasedness, robustness to diverse attacks, and compatibility with multiple FL frameworks, highlighting its practical impact in realistic non‑IID settings.

Abstract

In federated learning, most existing robust aggregation rules (AGRs) combat Byzantine attacks in the IID setting, where client data is assumed to be independent and identically distributed. In this paper, we address label skewness, a more realistic and challenging non-IID setting, where each client only has access to a few classes of data. In this setting, state-of-the-art AGRs suffer from selection bias, leading to significant performance drop for particular classes; they are also more vulnerable to Byzantine attacks due to the increased variation among gradients of honest clients. To address these limitations, we propose an efficient two-stage method named BOBA. Theoretically, we prove the convergence of BOBA with an error of the optimal order. Our empirical evaluations demonstrate BOBA's superior unbiasedness and robustness across diverse models and datasets when compared to various baselines. Our code is available at https://github.com/baowenxuan/BOBA .
Paper Structure (87 sections, 21 theorems, 119 equations, 14 figures, 9 tables, 2 algorithms)

This paper contains 87 sections, 21 theorems, 119 equations, 14 figures, 9 tables, 2 algorithms.

Key Result

Proposition 3.3

With $c$-label skew distribution, $\forall i \in \mathcal{H}$, we have where $\mathbb{E} \boldsymbol{\gamma}_z = \nabla_{\boldsymbol{w}} \sum_{\boldsymbol{\xi}} Q_z(\boldsymbol{\xi})\mathcal{L}(\boldsymbol{w}; \boldsymbol{\xi})$ is the expected gradient computed with data from class $z$.

Figures (14)

  • Figure 1: PCA of honest gradients on MNIST ($c = 10$). Over 99% of the variance concentrate on the first $(c-1)$ principal components, verifying that honest gradients distribute near the honest subspace.
  • Figure 2: Comparison of aggregation results. (1) Selection bias: Without attacks, the aggregation results () for GeoMed, Krum and CooMed are biased toward the majority class in the lower-right corner and deviate from the honest gradient center ($\bullet$), indicating their large biases. Meanwhile, BOBA is unbiased. (2) Increased vulnerability: With different attacks, the aggregation results will be different. The orange region represents the heatmap (2D histogram) of possible aggregation results given various attacks, where larger radius indicates worse robustness. BOBA has smallest radius, showing its stronger robustness than IID AGRs.
  • Figure 3: Running time of AGRs on MNIST
  • Figure 4: BOBA is robust to corrupted server data
  • Figure 5: Comparison of trimmed reconstruction loss of BOBA and BOBA-ES
  • ...and 9 more figures

Theorems & Definitions (56)

  • Definition 3.1: Inner, outer and total variations
  • Definition 3.2: $c$-label skew distribution
  • Proposition 3.3: Expectation of honest gradients
  • Proposition 5.1: Convergence
  • Proposition 5.4: Lower bound of gradient estimation error for any AGR
  • Theorem 5.5: Upper bound of gradient estimation error for BOBA
  • Definition B.1: $L$-smoothness
  • Definition B.2: $\mu$-strong convexity
  • Proposition 5.1: Convergence with smooth non-negative loss
  • proof
  • ...and 46 more