Table of Contents
Fetching ...

Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning

Debora Caldarola, Pietro Cagnasso, Barbara Caputo, Marco Ciccone

TL;DR

This paper tackles the challenge that heterogeneous federated learning data induce sharp global minima, hindering generalization. It introduces FedGloSS, a server-side Sharpness-Aware Minimization (SAM) approach that optimizes global sharpness while preserving communication efficiency by approximating the SAM perturbation with the previous global pseudo-gradient and leveraging ADMM to align local and global solutions. The method achieves flatter global minima, better accuracy, and reduced communication costs across standard FL benchmarks (Cifar-10/100) and real-world-scale data (Landmarks-160k), outperforming state-of-the-art methods especially under high heterogeneity. The work demonstrates that coordinating global sharpness management on the server, combined with global-local consistency, yields practical improvements for real-world FL deployments where communication is a bottleneck.

Abstract

Federated learning (FL) enables collaborative model training with privacy preservation. Data heterogeneity across edge devices (clients) can cause models to converge to sharp minima, negatively impacting generalization and robustness. Recent approaches use client-side sharpness-aware minimization (SAM) to encourage flatter minima, but the discrepancy between local and global loss landscapes often undermines their effectiveness, as optimizing for local sharpness does not ensure global flatness. This work introduces FedGloSS (Federated Global Server-side Sharpness), a novel FL approach that prioritizes the optimization of global sharpness on the server, using SAM. To reduce communication overhead, FedGloSS cleverly approximates sharpness using the previous global gradient, eliminating the need for additional client communication. Our extensive evaluations demonstrate that FedGloSS consistently reaches flatter minima and better performance compared to state-of-the-art FL methods across various federated vision benchmarks.

Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning

TL;DR

This paper tackles the challenge that heterogeneous federated learning data induce sharp global minima, hindering generalization. It introduces FedGloSS, a server-side Sharpness-Aware Minimization (SAM) approach that optimizes global sharpness while preserving communication efficiency by approximating the SAM perturbation with the previous global pseudo-gradient and leveraging ADMM to align local and global solutions. The method achieves flatter global minima, better accuracy, and reduced communication costs across standard FL benchmarks (Cifar-10/100) and real-world-scale data (Landmarks-160k), outperforming state-of-the-art methods especially under high heterogeneity. The work demonstrates that coordinating global sharpness management on the server, combined with global-local consistency, yields practical improvements for real-world FL deployments where communication is a bottleneck.

Abstract

Federated learning (FL) enables collaborative model training with privacy preservation. Data heterogeneity across edge devices (clients) can cause models to converge to sharp minima, negatively impacting generalization and robustness. Recent approaches use client-side sharpness-aware minimization (SAM) to encourage flatter minima, but the discrepancy between local and global loss landscapes often undermines their effectiveness, as optimizing for local sharpness does not ensure global flatness. This work introduces FedGloSS (Federated Global Server-side Sharpness), a novel FL approach that prioritizes the optimization of global sharpness on the server, using SAM. To reduce communication overhead, FedGloSS cleverly approximates sharpness using the previous global gradient, eliminating the need for additional client communication. Our extensive evaluations demonstrate that FedGloSS consistently reaches flatter minima and better performance compared to state-of-the-art FL methods across various federated vision benchmarks.

Paper Structure

This paper contains 56 sections, 9 equations, 21 figures, 15 tables, 1 algorithm.

Figures (21)

  • Figure 1: Comparison of FedAvg (solid) and FedSam (net) loss landscapes with varying degrees of data heterogeneity ($\alpha$) on the Cifar datasets. FedSam's effectiveness in converging to global flat minima is highly influenced by the data heterogeneity, where higher heterogeneity ($\alpha \rightarrow 0$) leads to sharper minima, and the complexity of the task, e.g., higher sharpness for the more complex Cifar100. This highlights the importance of optimizing global sharpness. Model: CNN.
  • Figure 2: Global vs. local perspective on FedSam. Cifar100$\alpha=0$ @ $20k$ rounds on CNN. Local models trained on one class, tested on the local (bottom landscape) or global dataset (top landscape). Models trained with FedSam present significant differences between local and global behaviors.
  • Figure 3: Illustration of FedGloSS. The model $\mathop{\mathrm{\pmb{w}}}\nolimits^t$ is perturbed using $\Tilde{\Delta}_{\mathop{\mathrm{\pmb{w}}}\nolimits}^{t-1}$. The sharpness-aware direction (dashed) is used to compute $\mathop{\mathrm{\pmb{w}}}\nolimits^{t+1}$ (solid), which lands in a flat region. Compared to FedAvg.
  • Figure 4: Global vs. local perspective of FedGloSS and FedSmoo. Loss landscapes of clients models trained on one class, tested on the local ("Local loss") or global dataset ("Global loss"). Cifar100$\alpha=0$ with Sam as local optimizer @ $t=20k$, CNN. (a)-(b): Models trained with FedGloSS. Global loss of FedSam's local model (net) as reference. (c)-(d): Models trained with FedSmoo. Global loss of FedGloSS' local model (net) as reference. FedGloSS achieves better consistency w.r.t. FedSmoo.
  • Figure 5: Trend of the difference $\delta_{\mathop{\mathrm{\pmb{\epsilon}}}\nolimits}^t$ (\ref{['math:eps_difference']}), which decreases as ADMM is used and over training rounds. Cifar datasets, CNN.
  • ...and 16 more figures