Table of Contents
Fetching ...

Neighborhood and Global Perturbations Supported SAM in Federated Learning: From Local Tweaks To Global Awareness

Boyuan Li, Zihao Peng, Yafei Li, Mingliang Xu, Shengbo Chen, Baofeng Ji, Cong Shen

TL;DR

FedTOGA addresses the core challenge of data heterogeneity in Federated Learning by unifying local perturbations with global optimization through a novel three-pronged approach: (1) estimating global perturbations from server updates to guide local SAM steps, (2) introducing neighborhood perturbations that reuse cached gradients to reduce local computation, and (3) applying global correction to local dynamic regularizers via an ADMM-like mechanism. This results in reduced uplink communication and storage overhead while ensuring alignment between local and global objectives, yielding an $O(1/T)$ convergence rate for non-convex objectives. The method is theoretically analyzed under standard FL assumptions and empirically validated on CIFAR-10/100 with Dirichlet and Pathological data splits, where FedTOGA outperforms 17 baselines in final accuracy and convergence speed. The work thus advances practical, robust, and scalable Federated Learning in highly heterogeneous and bandwidth-constrained environments.

Abstract

Federated Learning (FL) can be coordinated under the orchestration of a central server to collaboratively build a privacy-preserving model without the need for data exchange. However, participant data heterogeneity leads to local optima divergence, subsequently affecting convergence outcomes. Recent research has focused on global sharpness-aware minimization (SAM) and dynamic regularization techniques to enhance consistency between global and local generalization and optimization objectives. Nonetheless, the estimation of global SAM introduces additional computational and memory overhead, while dynamic regularization suffers from bias in the local and global dual variables due to training isolation. In this paper, we propose a novel FL algorithm, FedTOGA, designed to consider optimization and generalization objectives while maintaining minimal uplink communication overhead. By linking local perturbations to global updates, global generalization consistency is improved. Additionally, global updates are used to correct local dynamic regularizers, reducing dual variables bias and enhancing optimization consistency. Global updates are passively received by clients, reducing overhead. We also propose neighborhood perturbation to approximate local perturbation, analyzing its strengths and limitations. Theoretical analysis shows FedTOGA achieves faster convergence $O(1/T)$ under non-convex functions. Empirical studies demonstrate that FedTOGA outperforms state-of-the-art algorithms, with a 1\% accuracy increase and 30\% faster convergence, achieving state-of-the-art.

Neighborhood and Global Perturbations Supported SAM in Federated Learning: From Local Tweaks To Global Awareness

TL;DR

FedTOGA addresses the core challenge of data heterogeneity in Federated Learning by unifying local perturbations with global optimization through a novel three-pronged approach: (1) estimating global perturbations from server updates to guide local SAM steps, (2) introducing neighborhood perturbations that reuse cached gradients to reduce local computation, and (3) applying global correction to local dynamic regularizers via an ADMM-like mechanism. This results in reduced uplink communication and storage overhead while ensuring alignment between local and global objectives, yielding an convergence rate for non-convex objectives. The method is theoretically analyzed under standard FL assumptions and empirically validated on CIFAR-10/100 with Dirichlet and Pathological data splits, where FedTOGA outperforms 17 baselines in final accuracy and convergence speed. The work thus advances practical, robust, and scalable Federated Learning in highly heterogeneous and bandwidth-constrained environments.

Abstract

Federated Learning (FL) can be coordinated under the orchestration of a central server to collaboratively build a privacy-preserving model without the need for data exchange. However, participant data heterogeneity leads to local optima divergence, subsequently affecting convergence outcomes. Recent research has focused on global sharpness-aware minimization (SAM) and dynamic regularization techniques to enhance consistency between global and local generalization and optimization objectives. Nonetheless, the estimation of global SAM introduces additional computational and memory overhead, while dynamic regularization suffers from bias in the local and global dual variables due to training isolation. In this paper, we propose a novel FL algorithm, FedTOGA, designed to consider optimization and generalization objectives while maintaining minimal uplink communication overhead. By linking local perturbations to global updates, global generalization consistency is improved. Additionally, global updates are used to correct local dynamic regularizers, reducing dual variables bias and enhancing optimization consistency. Global updates are passively received by clients, reducing overhead. We also propose neighborhood perturbation to approximate local perturbation, analyzing its strengths and limitations. Theoretical analysis shows FedTOGA achieves faster convergence under non-convex functions. Empirical studies demonstrate that FedTOGA outperforms state-of-the-art algorithms, with a 1\% accuracy increase and 30\% faster convergence, achieving state-of-the-art.
Paper Structure (25 sections, 2 theorems, 11 equations, 6 figures, 9 tables, 3 algorithms)

This paper contains 25 sections, 2 theorems, 11 equations, 6 figures, 9 tables, 3 algorithms.

Key Result

Theorem 1

Under Assumption assumption_1-assumption_3, For any training interval $t$ on $i$-th client, model divergence satisfies: where $H_i(\tau) \leq \frac{L^2\rho^2\sigma_g^{'2}+\sigma_g^2}{2L^2}((1+2\eta_l^2L^2)^{\tau}-1)$, $\{v^t\}$is a virtual sequence representing the global model. More Details can be referred to the Appendix.

Figures (6)

  • Figure 1: Fig.(a)-(c) shows the loss surface under FL IID as well as Non-IID setting, and Fig.(d)-(f) shows the FL system, where the gray color represents the global consensus while the colored regions represent the local knowledge. In Fig.(d), no further consensus can be increased in FL only supported by the SAM optimizer. In Fig.(e), a dynamic regularizer is introduced in some work to increase global generalization. In Fig.(f), we further introduce Global Update to extend the generalization.
  • Figure 2: Hyperparameters sensitivity studies of lr decay, penalized coeficient $\alpha$, Correction coeficient $\beta, \kappa$ and perturbations coefficient $\rho$ on CIFAR-10.
  • Figure 3: Illumination of the perturbation technique and its variants
  • Figure 4: Heatmaps of the data distributions for ClAR10 and ClFAR100 for Dirichlet distributions with coefficients of 0.6 and 0.1, respectively, and for Pathological sampling probabilities with coefficients of 6/20 and 3/10. Both datasets consistently include 100 / 200 clients.
  • Figure 5: Accyracy/ Loss on the CIFAR-10 dataset under 10% /5% participation of total 100/200 clients
  • ...and 1 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Remark 3