A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

Puning Zhao; Fei Yu; Zhiguo Wan

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

Puning Zhao, Fei Yu, Zhiguo Wan

TL;DR

This paper introduces a gradient-aggregation method for Byzantine-robust federated learning based on a multi-dimensional Huber loss. By solving a weighted Huber loss minimization at each round, the server robustly aggregates client gradients without requiring exact knowledge of the attack fraction ε, and adapts to unbalanced and heterogeneous data. The authors provide theoretical guarantees under i.i.d. and non-i.i.d. settings, showing near-minimax ε-dependence and robust convergence across strong/convex and non-convex objectives. Implementation via a Weiszfeld-inspired algorithm enables practical deployment with linear-time per-iteration cost, and numerical experiments on synthetic and MNIST data demonstrate strong robustness against diverse Byzantine attacks. Overall, the approach offers a principled, ε-agnostic, scalable solution for robust gradient aggregation in federated learning with realistic data heterogeneity.

Abstract

Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on $ε$, which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of $ε$. Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

TL;DR

Abstract

, which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of

. Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.

Paper Structure (30 sections, 20 theorems, 216 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 20 theorems, 216 equations, 5 figures, 1 table, 1 algorithm.

Introduction
The Proposed Method
Theoretical Analysis for I.I.D Case
Balanced Data
Unbalanced Data
Theoretical Analysis for Non-I.I.D Case
Implementation
Comparison with Related Work
Numerical Results
Synthesized Data
Real data
Unbalanced sample allocation
Heterogeneous Data
Conclusion
Common Lemmas on Convergence
...and 15 more sections

Key Result

Theorem 1

There exists two constants $C_1$ and $C_2$, if then under Assumption ass:cover and ass:iid, with $|\mathcal{B}|=\epsilon m$ Byzantine clients, the following equations hold with probability at least $1-\delta$. (1) (Strong convex) Under Assumption 1(a), if $\eta\leq 1/L$, in which $\rho = \eta\mu/2$; (2) (General convex) Under Assumption 1(b), with $\eta=1/L$, after $t_m=(L/\Delta_A)\left\lVert\m

Figures (5)

Figure 1: Comparison of our new method and several baselines against Krum Attack and Trimmed Mean Attack for synthesized data with $\epsilon=0.2$.
Figure 2: Comparison of our new method and several baselines against sign-flip, KA, TMA and HLMA for MNIST data, with $\epsilon=0.2$.
Figure 3: Comparison of our new method and several baselines against sign-flip, KA, TMA and HLMA for MNIST data, with $\epsilon=0.4$.
Figure 4: Experiments on unbalanced data for HLMA, with $\epsilon=0.2$.
Figure 5: Experiments on non-i.i.d case under model \ref{['eq:modelnew']}, with $\epsilon=0.2$.

Theorems & Definitions (31)

Theorem 1
Theorem 2
Theorem 3
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Theorem 1
...and 21 more

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

TL;DR

Abstract

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (31)