Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization

Xinyu Zhang; Weiyu Sun; Ying Chen

Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization

Xinyu Zhang, Weiyu Sun, Ying Chen

TL;DR

Non-IID data and device heterogeneity induce gradient conflicts on the server during federated learning, hindering convergence. The authors propose FedGH, a gradient harmonization method that computes pairwise cosine similarities among client gradients $g_t^k$ and, for conflicting pairs, applies an orthogonal projection using $g_t^i \leftarrow g_t^i - \frac{g_t^i \cdot \widetilde{g}_t^j}{\|\widetilde{g}_t^j\|^2} \widetilde{g}_t^j$ and $g_t^j \leftarrow g_t^j - \frac{g_t^j \cdot \widetilde{g}_t^i}{\|\widetilde{g}_t^i\|^2} \widetilde{g}_t^i$, before aggregating with weights $\frac{n_k}{n}$, aiming to harmonize updates. The global objective is $L(w) = \sum_{k=1}^{K} \frac{n_k}{n} L_k(w)$, and the method is designed as a plug-and-play server-side module with no hyperparameter tuning. Empirical results across CIFAR-10/100, Tiny-ImageNet, and LEAF show FedGH consistently improves baselines (FedAvg, FedProx, FedNova, FedDecorr), with larger gains under stronger non-IIDness and notable reductions in communication rounds. Overall, FedGH offers a simple, effective mechanism to mitigate gradient conflicts in heterogeneous FL, enhancing convergence and practical deployment.

Abstract

Federated learning (FL) is a privacy-preserving paradigm for collaboratively training a global model from decentralized clients. However, the performance of FL is hindered by non-independent and identically distributed (non-IID) data and device heterogeneity. In this work, we revisit this key challenge through the lens of gradient conflicts on the server side. Specifically, we first investigate the gradient conflict phenomenon among multiple clients and reveal that stronger heterogeneity leads to more severe gradient conflicts. To tackle this issue, we propose FedGH, a simple yet effective method that mitigates local drifts through Gradient Harmonization. This technique projects one gradient vector onto the orthogonal plane of the other within conflicting client pairs. Extensive experiments demonstrate that FedGH consistently enhances multiple state-of-the-art FL baselines across diverse benchmarks and non-IID scenarios. Notably, FedGH yields more significant improvements in scenarios with stronger heterogeneity. As a plug-and-play module, FedGH can be seamlessly integrated into any FL framework without requiring hyperparameter tuning.

Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization

TL;DR

and, for conflicting pairs, applies an orthogonal projection using

and

, before aggregating with weights

, aiming to harmonize updates. The global objective is

, and the method is designed as a plug-and-play server-side module with no hyperparameter tuning. Empirical results across CIFAR-10/100, Tiny-ImageNet, and LEAF show FedGH consistently improves baselines (FedAvg, FedProx, FedNova, FedDecorr), with larger gains under stronger non-IIDness and notable reductions in communication rounds. Overall, FedGH offers a simple, effective mechanism to mitigate gradient conflicts in heterogeneous FL, enhancing convergence and practical deployment.

Abstract

Paper Structure (11 sections, 1 equation, 3 figures, 5 tables, 1 algorithm)

This paper contains 11 sections, 1 equation, 3 figures, 5 tables, 1 algorithm.

Introduction
Proposed Method
Problem Definition
Gradient Conflict in Heterogeneous FL
Tackling the Non-IID Issue by Gradient Harmonization
Experiments
Implementation Details
Performance Boosting
Ablation study on the number of clients
Ablation study on the number of local epochs
Conclusion

Figures (3)

Figure 1: Illustration of optimization directions from (a) centralized training, (b) homogenous FL, and (c) heterogeneous FL with gradient conflicts.
Figure 2: The non-IID issue causes gradient conflicts among (a) 5 and (c) 10 clients, with $\alpha$ = 0.1 and 0.01, respectively. The $x$-axis is client index pairs sorted by $y$, while the $y$-axis shows cosine similarity between gradient vectors. The change in gradient conflict ratio during training is depicted in (b).
Figure 3: Illustration of FedGH's effectiveness. If gradient conflict occurs between clients $i$ and $j$ in round $t$, FedGH projects $g_t^i$ and $g_t^j$ onto each other's orthogonal planes. We highlight that FedGH yields a faster convergence rate in (a) while effectively mitigating local drifts from global minima in (b).

Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization

TL;DR

Abstract

Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization

Authors

TL;DR

Abstract

Table of Contents

Figures (3)