Table of Contents
Fetching ...

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

Puning Zhao, Fei Yu, Zhiguo Wan

TL;DR

This paper introduces a gradient-aggregation method for Byzantine-robust federated learning based on a multi-dimensional Huber loss. By solving a weighted Huber loss minimization at each round, the server robustly aggregates client gradients without requiring exact knowledge of the attack fraction ε, and adapts to unbalanced and heterogeneous data. The authors provide theoretical guarantees under i.i.d. and non-i.i.d. settings, showing near-minimax ε-dependence and robust convergence across strong/convex and non-convex objectives. Implementation via a Weiszfeld-inspired algorithm enables practical deployment with linear-time per-iteration cost, and numerical experiments on synthetic and MNIST data demonstrate strong robustness against diverse Byzantine attacks. Overall, the approach offers a principled, ε-agnostic, scalable solution for robust gradient aggregation in federated learning with realistic data heterogeneity.

Abstract

Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on $ε$, which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of $ε$. Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

TL;DR

This paper introduces a gradient-aggregation method for Byzantine-robust federated learning based on a multi-dimensional Huber loss. By solving a weighted Huber loss minimization at each round, the server robustly aggregates client gradients without requiring exact knowledge of the attack fraction ε, and adapts to unbalanced and heterogeneous data. The authors provide theoretical guarantees under i.i.d. and non-i.i.d. settings, showing near-minimax ε-dependence and robust convergence across strong/convex and non-convex objectives. Implementation via a Weiszfeld-inspired algorithm enables practical deployment with linear-time per-iteration cost, and numerical experiments on synthetic and MNIST data demonstrate strong robustness against diverse Byzantine attacks. Overall, the approach offers a principled, ε-agnostic, scalable solution for robust gradient aggregation in federated learning with realistic data heterogeneity.

Abstract

Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on , which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of . Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.
Paper Structure (30 sections, 20 theorems, 216 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 20 theorems, 216 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

There exists two constants $C_1$ and $C_2$, if then under Assumption ass:cover and ass:iid, with $|\mathcal{B}|=\epsilon m$ Byzantine clients, the following equations hold with probability at least $1-\delta$. (1) (Strong convex) Under Assumption 1(a), if $\eta\leq 1/L$, in which $\rho = \eta\mu/2$; (2) (General convex) Under Assumption 1(b), with $\eta=1/L$, after $t_m=(L/\Delta_A)\left\lVert\m

Figures (5)

  • Figure 1: Comparison of our new method and several baselines against Krum Attack and Trimmed Mean Attack for synthesized data with $\epsilon=0.2$.
  • Figure 2: Comparison of our new method and several baselines against sign-flip, KA, TMA and HLMA for MNIST data, with $\epsilon=0.2$.
  • Figure 3: Comparison of our new method and several baselines against sign-flip, KA, TMA and HLMA for MNIST data, with $\epsilon=0.4$.
  • Figure 4: Experiments on unbalanced data for HLMA, with $\epsilon=0.2$.
  • Figure 5: Experiments on non-i.i.d case under model \ref{['eq:modelnew']}, with $\epsilon=0.2$.

Theorems & Definitions (31)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 1
  • ...and 21 more