Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients

Shiyuan Zuo; Xingrun Yan; Rongfei Fan; Li Shen; Puning Zhao; Jie Xu; Han Hu

Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients

Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Li Shen, Puning Zhao, Jie Xu, Han Hu

TL;DR

This work addresses robustness in Federated Learning under Byzantine attacks and non-IID data by introducing Fed-NGA, a lightweight aggregation rule that normalizes client gradients before weighted averaging. The method achieves a favorable aggregation time of $\mathcal{O}(pM)$ and is proven to converge for non-convex losses to a neighborhood of stationary points at a rate of $\mathcal{O}(1/T^{\frac{1}{2}-\delta})$, with conditions under which zero optimality gap is attainable. Theoretical results cover two variants of assumptions on gradient noise and data heterogeneity, and extensive experiments across MNIST, CIFAR10, and TinyImageNet demonstrate Fed-NGA’s robustness to multiple Byzantine attacks and substantial time-efficiency gains over baselines. The findings suggest Fed-NGA as a scalable, Byzantine-robust solution for non-IID FL, with practical impact for large-scale distributed learning systems.

Abstract

Federated Learning (FL) enables multiple clients to collaboratively train models without sharing raw data, but is vulnerable to Byzantine attacks and data heterogeneity, which can severely degrade performance. Existing Byzantine-robust approaches tackle data heterogeneity, but incur high computational overhead during gradient aggregation, thereby slowing down the training process. To address this issue, we propose a simple yet effective Federated Normalized Gradients Algorithm (Fed-NGA), which performs aggregation by merely computing the weighted mean of the normalized gradients from each client. This approach yields a favorable time complexity of $\mathcal{O}(pM)$, where $p$ is the model dimension and $M$ is the number of clients. We rigorously prove that Fed-NGA is robust to both Byzantine faults and data heterogeneity. For non-convex loss functions, Fed-NGA achieves convergence to a neighborhood of stationary points under general assumptions, and further attains zero optimality gap under some mild conditions, which is an outcome rarely achieved in existing literature. In both cases, the convergence rate is $\mathcal{O}(1/T^{\frac{1}{2} - δ})$, where $T$ denotes the number of iterations and $δ\in (0, 1/2)$. Experimental results on benchmark datasets confirm the superior time efficiency and convergence performance of Fed-NGA over existing methods.

Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients

TL;DR

and is proven to converge for non-convex losses to a neighborhood of stationary points at a rate of

, with conditions under which zero optimality gap is attainable. Theoretical results cover two variants of assumptions on gradient noise and data heterogeneity, and extensive experiments across MNIST, CIFAR10, and TinyImageNet demonstrate Fed-NGA’s robustness to multiple Byzantine attacks and substantial time-efficiency gains over baselines. The findings suggest Fed-NGA as a scalable, Byzantine-robust solution for non-IID FL, with practical impact for large-scale distributed learning systems.

Abstract

, where

is the model dimension and

is the number of clients. We rigorously prove that Fed-NGA is robust to both Byzantine faults and data heterogeneity. For non-convex loss functions, Fed-NGA achieves convergence to a neighborhood of stationary points under general assumptions, and further attains zero optimality gap under some mild conditions, which is an outcome rarely achieved in existing literature. In both cases, the convergence rate is

, where

denotes the number of iterations and

. Experimental results on benchmark datasets confirm the superior time efficiency and convergence performance of Fed-NGA over existing methods.

Paper Structure (25 sections, 2 theorems, 37 equations, 17 figures, 9 tables, 1 algorithm)

This paper contains 25 sections, 2 theorems, 37 equations, 17 figures, 9 tables, 1 algorithm.

Introduction
Problems Statement
Problem Setup
Algorithm Description
Theoretical Results
Assumption
Convergence Analysis
With Assumption \ref{['ass:variance1']} and \ref{['ass:heterogeneity1']} on the loss function
With Assumption \ref{['ass:variance2']} and \ref{['ass:heterogeneity2']} on the loss function
Experiments
Setup
Results for Convergence Performance
Conclusion
Acknowledgment
The Reasonableness of Assumptions \ref{['ass:variance2']} and \ref{['ass:heterogeneity2']}
...and 10 more sections

Key Result

Theorem 3.7

With Assumption ass:smooth, ass:unbiased, ass:variance1, ass:heterogeneity1, and $\rho > 0$ such that $\frac{2\rho}{\rho+1} C_{\alpha} - 1 > 0$, we have

Figures (17)

Figure 1: Illustration of the learning process of Fed-NGA on iteration number $t$.
Figure 2: The maximum test accuracy (%) for Fed-NGA and baselines with $\beta=0.6$ on CIFAR10 dataset and LeNet model.
Figure 3: The running time (s) for Fed-NGA and baselines is evaluated on Same-value attack and $\beta=0.6$ across three different Byzantine ratios.
Figure 4: The maximum test accuracy (%) for Fed-NGA is evaluated on TinyImageNet dataset and $\bar{C}_{\alpha} = 0.2$ across three different data heterogeneity concentration parameters.
Figure 5: The illustration depicts the vectors $x_1$, $x_2$, $x_1 - x_2$, $\frac{x_1}{\lVert x_1 \rVert}$, $\frac{x_2}{\lVert x_2 \rVert}$, and $\frac{x_1}{\lVert x_1 \rVert} - \frac{x_2}{\lVert x_2 \rVert}$, with the radius of the circle set to 1.
...and 12 more figures

Theorems & Definitions (4)

Theorem 3.7
proof
Theorem 3.9
proof

Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients

TL;DR

Abstract

Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (4)