Table of Contents
Fetching ...

GC-Fed: Gradient Centralized Federated Learning with Partial Client Participation

Jungwon Seo, Ferhat Ozgur Catak, Chunming Rong, Kibeom Hong, Minhoe Kim

TL;DR

GC-Fed introduces gradient centralization into federated learning as a reference-free means to reduce client drift under heterogeneous data and partial participation. By applying Local GC to feature extraction layers and Global GC to classifier layers, the approach centralizes updates through a shared hyperplane reference without extra communication. Theoretical analysis shows the projected-gradient updates reduce the optimality gap more efficiently than FedAvg, and extensive experiments across EMNIST, CIFAR, and TinyImageNet demonstrate substantial accuracy gains and faster convergence. This layer-aware, projection-based method provides a practical enhancement for cross-device FL, with strong empirical support and flexible hyperparameters for adapting to diverse architectures and data distributions.

Abstract

Federated Learning (FL) enables privacy-preserving multi-source information fusion (MSIF) but is challenged by client drift in highly heterogeneous data settings. Many existing drift-mitigation strategies rely on reference-based techniques--such as gradient adjustments or proximal loss--that use historical snapshots (e.g., past gradients or previous global models) as reference points. When only a subset of clients participates in each training round, these historical references may not accurately capture the overall data distribution, leading to unstable training. In contrast, our proposed Gradient Centralized Federated Learning (GC-Fed) employs a hyperplane as a historically independent reference point to guide local training and enhance inter-client alignment. GC-Fed comprises two complementary components: Local GC, which centralizes gradients during local training, and Global GC, which centralizes updates during server aggregation. In our hybrid design, Local GC is applied to feature-extraction layers to harmonize client contributions, while Global GC refines classifier layers to stabilize round-wise performance. Theoretical analysis and extensive experiments on benchmark FL tasks demonstrate that GC-Fed effectively mitigates client drift and achieves up to a 20% improvement in accuracy under heterogeneous and partial participation conditions.

GC-Fed: Gradient Centralized Federated Learning with Partial Client Participation

TL;DR

GC-Fed introduces gradient centralization into federated learning as a reference-free means to reduce client drift under heterogeneous data and partial participation. By applying Local GC to feature extraction layers and Global GC to classifier layers, the approach centralizes updates through a shared hyperplane reference without extra communication. Theoretical analysis shows the projected-gradient updates reduce the optimality gap more efficiently than FedAvg, and extensive experiments across EMNIST, CIFAR, and TinyImageNet demonstrate substantial accuracy gains and faster convergence. This layer-aware, projection-based method provides a practical enhancement for cross-device FL, with strong empirical support and flexible hyperparameters for adapting to diverse architectures and data distributions.

Abstract

Federated Learning (FL) enables privacy-preserving multi-source information fusion (MSIF) but is challenged by client drift in highly heterogeneous data settings. Many existing drift-mitigation strategies rely on reference-based techniques--such as gradient adjustments or proximal loss--that use historical snapshots (e.g., past gradients or previous global models) as reference points. When only a subset of clients participates in each training round, these historical references may not accurately capture the overall data distribution, leading to unstable training. In contrast, our proposed Gradient Centralized Federated Learning (GC-Fed) employs a hyperplane as a historically independent reference point to guide local training and enhance inter-client alignment. GC-Fed comprises two complementary components: Local GC, which centralizes gradients during local training, and Global GC, which centralizes updates during server aggregation. In our hybrid design, Local GC is applied to feature-extraction layers to harmonize client contributions, while Global GC refines classifier layers to stabilize round-wise performance. Theoretical analysis and extensive experiments on benchmark FL tasks demonstrate that GC-Fed effectively mitigates client drift and achieves up to a 20% improvement in accuracy under heterogeneous and partial participation conditions.

Paper Structure

This paper contains 33 sections, 2 theorems, 26 equations, 10 figures, 8 tables, 1 algorithm.

Key Result

Lemma 1

One step update of Projected Gradient reduces the gap of $\mathbf{w}^*$ and $\mathbf{w}^t$, by $\eta_\tau^2\|\overline{\Tilde{\mathbf{G}}}_\tau\|^2 + \eta_\tau^2 \| \Tilde{\mathbf{G}}_\tau - \overline{\Tilde{\mathbf{G}}}_\tau \|^2 -2\eta_\tau \langle\overline{\mathbf{w}}_\tau - \mathbf{w}^*, \overli

Figures (10)

  • Figure 1: In partial participation scenarios, the gap between the true update (aggregated from all clients) and the partial update (aggregated from a subset of clients) widens as the number of participating clients decreases, leading to increased training instability, unlike in an i.i.d. setting.
  • Figure 2: Overview of the proposed GC-Fed framework for FL. Our approach incorporates Gradient Centralization in two phases: Local GC is applied to the feature extraction layers during local SGD, while Global GC is employed at the classifier layer during model aggregation. The application of these strategies is modulated by the hyperparameter $\lambda$.
  • Figure 3: Visualization of GC as a gradient projection method in FL
  • Figure 4: Training dynamics of Top-1 test accuracy across different GC methods on CIFAR-10 with a CNN model (R: 800, participation: 5/200, LDA $\alpha = 0.05$). The curves are smoothed using a moving average window of 10, with shaded areas representing the original values. Dashed lines indicate the peak accuracy, while dotted vertical and horizontal lines mark the smoothed accuracy at round 200.
  • Figure 5: Data partitioning across different LDA $\alpha$ values in a 10-class, 10-client setting. Lower $\alpha$ values induce higher heterogeneity in class distribution and data volume per client, while higher $\alpha$ values yield a more homogeneous distribution.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Definition 1: Local GC
  • Definition 2: Global GC
  • Lemma 1
  • Theorem 3
  • proof
  • proof