Gradient-Congruity Guided Federated Sparse Training

Chris Xing Tian; Yibing Liu; Haoliang Li; Ray C. C. Cheung; Shiqi Wang

Gradient-Congruity Guided Federated Sparse Training

Chris Xing Tian, Yibing Liu, Haoliang Li, Ray C. C. Cheung, Shiqi Wang

TL;DR

Federated learning on edge devices faces high computation/communication costs and non-IID data challenges. FedSGC combines dynamic sparse training with gradient congruity inspection, using a prune-and-grow mechanism guided by a global direction map $d^{r+1}=sign(\theta^{r+1}-\theta^r)$ and a target sparsity $S$, to prune conflicting neurons and promote growth of consistently informative ones. The approach includes a gradient-guided pruning/growing criterion and a global aggregation strategy that maintains sparsity $S$ while leveraging non-participating clients' updates, enabling robust learning with reduced costs. Across MNIST, CIFAR-10, and domain-generalization-like PACS settings, FedSGC achieves competitive accuracy with substantial communication savings and strong compatibility with FedProx, indicating practical impact for resource-constrained FL under heterogeneity.

Abstract

Edge computing allows artificial intelligence and machine learning models to be deployed on edge devices, where they can learn from local data and collaborate to form a global model. Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy. However, FL also faces challenges such as high computational and communication costs regarding resource-constrained devices, and poor generalization performance due to the heterogeneity of data across edge clients and the presence of out-of-distribution data. In this paper, we propose the Gradient-Congruity Guided Federated Sparse Training (FedSGC), a novel method that integrates dynamic sparse training and gradient congruity inspection into federated learning framework to address these issues. Our method leverages the idea that the neurons, in which the associated gradients with conflicting directions with respect to the global model contain irrelevant or less generalized information for other clients, and could be pruned during the sparse training process. Conversely, the neurons where the associated gradients with consistent directions could be grown in a higher priority. In this way, FedSGC can greatly reduce the local computation and communication overheads while, at the same time, enhancing the generalization abilities of FL. We evaluate our method on challenging non-i.i.d settings and show that it achieves competitive accuracy with state-of-the-art FL methods across various scenarios while minimizing computation and communication costs.

Gradient-Congruity Guided Federated Sparse Training

TL;DR

and a target sparsity

, to prune conflicting neurons and promote growth of consistently informative ones. The approach includes a gradient-guided pruning/growing criterion and a global aggregation strategy that maintains sparsity

while leveraging non-participating clients' updates, enabling robust learning with reduced costs. Across MNIST, CIFAR-10, and domain-generalization-like PACS settings, FedSGC achieves competitive accuracy with substantial communication savings and strong compatibility with FedProx, indicating practical impact for resource-constrained FL under heterogeneity.

Abstract

Paper Structure (14 sections, 5 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 5 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Methodology
The Prune-and-Grow Mechanism for Sparsity Training
Gradient guided Pruning and Growing
Global Aggregation
Experiments
Results on MNIST
Results on CIFAR10
Conclusions
Appendix
Communication and Computation Savings
Results on PACS
Details of Algorithm

Figures (5)

Figure 1: Results on pathological non-iid MNIST dataset.
Figure 2: Results on pathological non-iid MNIST dataset of different sparsity.
Figure 3: Results on pathological non-iid CIFAR10 dataset.
Figure 4: Results on Dirichlet distributed CIFAR10 dataset of different $\beta$ value.
Figure 5: Results on PACS dataset.

Gradient-Congruity Guided Federated Sparse Training

TL;DR

Abstract

Gradient-Congruity Guided Federated Sparse Training

Authors

TL;DR

Abstract

Table of Contents

Figures (5)