Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning
Yuchen Liu, Chen Chen, Lingjuan Lyu, Yaochu Jin, Gang Chen
TL;DR
Federated Learning under non-IID data distributions exhibits gradient skew, where a dense cluster of honest gradients deviates from the optimal mean $\bar{\bm{g}}$, weakening standard Byzantine defenses. The authors propose STRIKE, a two-stage skew-aware attack that first identifies skewed honest gradients using the direction $\bm{u}_{\text{search}}=\bm{g}_{\text{med}}-\bar{\bm{g}}$ and then crafts Byzantine gradients within that skewed set by forming $\bm{g}_{\mathrm{b}}=\bar{\bm{g}}_{\mathcal{S}}+\nu\alpha\cdot\mathrm{sign}(\bar{\bm{g}}_{\mathcal{S}})\odot\boldsymbol{\sigma}_{\mathcal{S}}$. Empirical results on CIFAR-10, ImageNet-12, and FEMNIST show STRIKE consistently outperforms twelve baseline attacks against seven robust aggregation rules (e.g., Multi-Krum, Median, RFA, Aksel, DnC, RBTM) and remains effective under bucketing and NNM, with notable gains (e.g., 57.84% improvement against DnC on FEMNIST with 20% Byzantine clients). The findings reveal gradient skew as a practical threat to current defenses and motivate developing skew-robust mitigation strategies, with future work aimed at defenses resilient to skew-aware attacks.
Abstract
Federated Learning (FL) is notorious for its vulnerability to Byzantine attacks. Most current Byzantine defenses share a common inductive bias: among all the gradients, the densely distributed ones are more likely to be honest. However, such a bias is a poison to Byzantine robustness due to a newly discovered phenomenon in this paper - gradient skew. We discover that a group of densely distributed honest gradients skew away from the optimal gradient (the average of honest gradients) due to heterogeneous data. This gradient skew phenomenon allows Byzantine gradients to hide within the densely distributed skewed gradients. As a result, Byzantine defenses are confused into believing that Byzantine gradients are honest. Motivated by this observation, we propose a novel skew-aware attack called STRIKE: first, we search for the skewed gradients; then, we construct Byzantine gradients within the skewed gradients. Experiments on three benchmark datasets validate the effectiveness of our attack
