Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning
Debora Caldarola, Pietro Cagnasso, Barbara Caputo, Marco Ciccone
TL;DR
This paper tackles the challenge that heterogeneous federated learning data induce sharp global minima, hindering generalization. It introduces FedGloSS, a server-side Sharpness-Aware Minimization (SAM) approach that optimizes global sharpness while preserving communication efficiency by approximating the SAM perturbation with the previous global pseudo-gradient and leveraging ADMM to align local and global solutions. The method achieves flatter global minima, better accuracy, and reduced communication costs across standard FL benchmarks (Cifar-10/100) and real-world-scale data (Landmarks-160k), outperforming state-of-the-art methods especially under high heterogeneity. The work demonstrates that coordinating global sharpness management on the server, combined with global-local consistency, yields practical improvements for real-world FL deployments where communication is a bottleneck.
Abstract
Federated learning (FL) enables collaborative model training with privacy preservation. Data heterogeneity across edge devices (clients) can cause models to converge to sharp minima, negatively impacting generalization and robustness. Recent approaches use client-side sharpness-aware minimization (SAM) to encourage flatter minima, but the discrepancy between local and global loss landscapes often undermines their effectiveness, as optimizing for local sharpness does not ensure global flatness. This work introduces FedGloSS (Federated Global Server-side Sharpness), a novel FL approach that prioritizes the optimization of global sharpness on the server, using SAM. To reduce communication overhead, FedGloSS cleverly approximates sharpness using the previous global gradient, eliminating the need for additional client communication. Our extensive evaluations demonstrate that FedGloSS consistently reaches flatter minima and better performance compared to state-of-the-art FL methods across various federated vision benchmarks.
