Table of Contents
Fetching ...

FedZMG: Efficient Client-Side Optimization in Federated Learning

Fotios Zantalis, Evangelos Zervas, Grigorios Koulouras

TL;DR

This paper introduces Federated Zero Mean Gradients (FedZMG), a novel, parameter-free, client-side optimization algorithm designed to tackle client-drift by structurally regularizing the optimization space and achieves better convergence speed and final validation accuracy compared to the baseline FedAvg and the adaptive optimizer FedAdam.

Abstract

Federated Learning (FL) enables distributed model training on edge devices while preserving data privacy. However, clients tend to have non-Independent and Identically Distributed (non-IID) data, which often leads to client-drift, and therefore diminishing convergence speed and model performance. While adaptive optimizers have been proposed to mitigate these effects, they frequently introduce computational complexity or communication overhead unsuitable for resource-constrained IoT environments. This paper introduces Federated Zero Mean Gradients (FedZMG), a novel, parameter-free, client-side optimization algorithm designed to tackle client-drift by structurally regularizing the optimization space. Advancing the idea of Gradient Centralization, FedZMG projects local gradients onto a zero-mean hyperplane, effectively neutralizing the "intensity" or "bias" shifts inherent in heterogeneous data distributions without requiring additional communication or hyperparameter tuning. A theoretical analysis is provided, proving that FedZMG reduces the effective gradient variance and guarantees tighter convergence bounds compared to standard FedAvg. Extensive empirical evaluations on EMNIST, CIFAR100, and Shakespeare datasets demonstrate that FedZMG achieves better convergence speed and final validation accuracy compared to the baseline FedAvg and the adaptive optimizer FedAdam, particularly in highly non-IID settings.

FedZMG: Efficient Client-Side Optimization in Federated Learning

TL;DR

This paper introduces Federated Zero Mean Gradients (FedZMG), a novel, parameter-free, client-side optimization algorithm designed to tackle client-drift by structurally regularizing the optimization space and achieves better convergence speed and final validation accuracy compared to the baseline FedAvg and the adaptive optimizer FedAdam.

Abstract

Federated Learning (FL) enables distributed model training on edge devices while preserving data privacy. However, clients tend to have non-Independent and Identically Distributed (non-IID) data, which often leads to client-drift, and therefore diminishing convergence speed and model performance. While adaptive optimizers have been proposed to mitigate these effects, they frequently introduce computational complexity or communication overhead unsuitable for resource-constrained IoT environments. This paper introduces Federated Zero Mean Gradients (FedZMG), a novel, parameter-free, client-side optimization algorithm designed to tackle client-drift by structurally regularizing the optimization space. Advancing the idea of Gradient Centralization, FedZMG projects local gradients onto a zero-mean hyperplane, effectively neutralizing the "intensity" or "bias" shifts inherent in heterogeneous data distributions without requiring additional communication or hyperparameter tuning. A theoretical analysis is provided, proving that FedZMG reduces the effective gradient variance and guarantees tighter convergence bounds compared to standard FedAvg. Extensive empirical evaluations on EMNIST, CIFAR100, and Shakespeare datasets demonstrate that FedZMG achieves better convergence speed and final validation accuracy compared to the baseline FedAvg and the adaptive optimizer FedAdam, particularly in highly non-IID settings.
Paper Structure (30 sections, 4 theorems, 26 equations, 2 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 4 theorems, 26 equations, 2 figures, 8 tables, 1 algorithm.

Key Result

Lemma 1

If $n_t\le \frac{1}{4L}$ and $\mathbf{1}^T(\bar{\mathbf{w}}_0-\mathbf{w}^{\ast})=0$ then where $\Gamma=F^{\ast}-\sum_k p_k F_k^{\ast}$

Figures (2)

  • Figure 1: Client--server ($\eta_c-\eta_s$) learning rate tuning via grid search across three datasets: CIFAR100 (top), Shakespeare (middle), and EMNIST (bottom).
  • Figure 2: Validation accuracy comparison. Top: CIFAR100; Middle: Shakespeare; Bottom: EMNIST.

Theorems & Definitions (8)

  • Lemma 1: One-Step Progress
  • proof
  • Lemma 2: Bounding variance
  • proof
  • Lemma 3: Bounding divergence
  • proof
  • Theorem 1: Convergence of FedZMG
  • proof