Neighborhood and Global Perturbations Supported SAM in Federated Learning: From Local Tweaks To Global Awareness
Boyuan Li, Zihao Peng, Yafei Li, Mingliang Xu, Shengbo Chen, Baofeng Ji, Cong Shen
TL;DR
FedTOGA addresses the core challenge of data heterogeneity in Federated Learning by unifying local perturbations with global optimization through a novel three-pronged approach: (1) estimating global perturbations from server updates to guide local SAM steps, (2) introducing neighborhood perturbations that reuse cached gradients to reduce local computation, and (3) applying global correction to local dynamic regularizers via an ADMM-like mechanism. This results in reduced uplink communication and storage overhead while ensuring alignment between local and global objectives, yielding an $O(1/T)$ convergence rate for non-convex objectives. The method is theoretically analyzed under standard FL assumptions and empirically validated on CIFAR-10/100 with Dirichlet and Pathological data splits, where FedTOGA outperforms 17 baselines in final accuracy and convergence speed. The work thus advances practical, robust, and scalable Federated Learning in highly heterogeneous and bandwidth-constrained environments.
Abstract
Federated Learning (FL) can be coordinated under the orchestration of a central server to collaboratively build a privacy-preserving model without the need for data exchange. However, participant data heterogeneity leads to local optima divergence, subsequently affecting convergence outcomes. Recent research has focused on global sharpness-aware minimization (SAM) and dynamic regularization techniques to enhance consistency between global and local generalization and optimization objectives. Nonetheless, the estimation of global SAM introduces additional computational and memory overhead, while dynamic regularization suffers from bias in the local and global dual variables due to training isolation. In this paper, we propose a novel FL algorithm, FedTOGA, designed to consider optimization and generalization objectives while maintaining minimal uplink communication overhead. By linking local perturbations to global updates, global generalization consistency is improved. Additionally, global updates are used to correct local dynamic regularizers, reducing dual variables bias and enhancing optimization consistency. Global updates are passively received by clients, reducing overhead. We also propose neighborhood perturbation to approximate local perturbation, analyzing its strengths and limitations. Theoretical analysis shows FedTOGA achieves faster convergence $O(1/T)$ under non-convex functions. Empirical studies demonstrate that FedTOGA outperforms state-of-the-art algorithms, with a 1\% accuracy increase and 30\% faster convergence, achieving state-of-the-art.
