Table of Contents
Fetching ...

Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning

Yuchen Liu, Chen Chen, Lingjuan Lyu, Yaochu Jin, Gang Chen

TL;DR

Federated Learning under non-IID data distributions exhibits gradient skew, where a dense cluster of honest gradients deviates from the optimal mean $\bar{\bm{g}}$, weakening standard Byzantine defenses. The authors propose STRIKE, a two-stage skew-aware attack that first identifies skewed honest gradients using the direction $\bm{u}_{\text{search}}=\bm{g}_{\text{med}}-\bar{\bm{g}}$ and then crafts Byzantine gradients within that skewed set by forming $\bm{g}_{\mathrm{b}}=\bar{\bm{g}}_{\mathcal{S}}+\nu\alpha\cdot\mathrm{sign}(\bar{\bm{g}}_{\mathcal{S}})\odot\boldsymbol{\sigma}_{\mathcal{S}}$. Empirical results on CIFAR-10, ImageNet-12, and FEMNIST show STRIKE consistently outperforms twelve baseline attacks against seven robust aggregation rules (e.g., Multi-Krum, Median, RFA, Aksel, DnC, RBTM) and remains effective under bucketing and NNM, with notable gains (e.g., 57.84% improvement against DnC on FEMNIST with 20% Byzantine clients). The findings reveal gradient skew as a practical threat to current defenses and motivate developing skew-robust mitigation strategies, with future work aimed at defenses resilient to skew-aware attacks.

Abstract

Federated Learning (FL) is notorious for its vulnerability to Byzantine attacks. Most current Byzantine defenses share a common inductive bias: among all the gradients, the densely distributed ones are more likely to be honest. However, such a bias is a poison to Byzantine robustness due to a newly discovered phenomenon in this paper - gradient skew. We discover that a group of densely distributed honest gradients skew away from the optimal gradient (the average of honest gradients) due to heterogeneous data. This gradient skew phenomenon allows Byzantine gradients to hide within the densely distributed skewed gradients. As a result, Byzantine defenses are confused into believing that Byzantine gradients are honest. Motivated by this observation, we propose a novel skew-aware attack called STRIKE: first, we search for the skewed gradients; then, we construct Byzantine gradients within the skewed gradients. Experiments on three benchmark datasets validate the effectiveness of our attack

Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning

TL;DR

Federated Learning under non-IID data distributions exhibits gradient skew, where a dense cluster of honest gradients deviates from the optimal mean , weakening standard Byzantine defenses. The authors propose STRIKE, a two-stage skew-aware attack that first identifies skewed honest gradients using the direction and then crafts Byzantine gradients within that skewed set by forming . Empirical results on CIFAR-10, ImageNet-12, and FEMNIST show STRIKE consistently outperforms twelve baseline attacks against seven robust aggregation rules (e.g., Multi-Krum, Median, RFA, Aksel, DnC, RBTM) and remains effective under bucketing and NNM, with notable gains (e.g., 57.84% improvement against DnC on FEMNIST with 20% Byzantine clients). The findings reveal gradient skew as a practical threat to current defenses and motivate developing skew-robust mitigation strategies, with future work aimed at defenses resilient to skew-aware attacks.

Abstract

Federated Learning (FL) is notorious for its vulnerability to Byzantine attacks. Most current Byzantine defenses share a common inductive bias: among all the gradients, the densely distributed ones are more likely to be honest. However, such a bias is a poison to Byzantine robustness due to a newly discovered phenomenon in this paper - gradient skew. We discover that a group of densely distributed honest gradients skew away from the optimal gradient (the average of honest gradients) due to heterogeneous data. This gradient skew phenomenon allows Byzantine gradients to hide within the densely distributed skewed gradients. As a result, Byzantine defenses are confused into believing that Byzantine gradients are honest. Motivated by this observation, we propose a novel skew-aware attack called STRIKE: first, we search for the skewed gradients; then, we construct Byzantine gradients within the skewed gradients. Experiments on three benchmark datasets validate the effectiveness of our attack

Paper Structure

This paper contains 29 sections, 14 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: The LLE visualization of honest gradients in the non-IID setting on CIFAR-10. Substantial honest gradients (blue circles) are skewed away from the optimal gradient (green star). In this case, we can hide Byzantine gradients (pink crosses) within the skewed honest gradients to circumvent defenses.
  • Figure 2: Visualization of gradient skew on CIFAR-10 dataset. As shown in the figures, the optimal gradients (green stars) deviate from the densely distributed gradients.
  • Figure 3: Illustration of the proposed two-stage attack STRIKE: in the first stage, STRIKE searches for the skewed honest gradients; in the second stage, STRIKE hides Byzantine gradients within the skewed honest gradients.
  • Figure 4: Accuracy under different attacks against seven robust AGRs with bucketing on ImageNet-12. The lower, the better.
  • Figure 5: Visualization of gradient skew on ImageNet-12 and FEMNIST
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 1: $(f, \kappa)$-robustness