Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

Ziqing Fan; Shengchao Hu; Jiangchao Yao; Gang Niu; Ya Zhang; Masashi Sugiyama; Yanfeng Wang

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya Zhang, Masashi Sugiyama, Yanfeng Wang

TL;DR

This work addresses the misalignment between local and global sharpness in federated learning caused by data heterogeneity, which degrades generalization when using sharpness-aware minimization (SAM). It introduces FedLESAM, a lightweight method that estimates the global perturbation direction on each client by using the difference between the global models from the previous active round and the current round, enabling a single backpropagation per iteration. The authors provide theoretical results showing a slightly tighter convergence bound than FedSAM and derive an estimation-error bound for the global perturbation direction. Empirically, FedLESAM and its variants achieve superior or competitive performance across four federated benchmarks under multiple data-splitting strategies while reducing computational overhead, demonstrating practical impact for scalable, privacy-preserving learning with improved global flatness.$F(w)$ and $F_i(w)$ denote the global and client losses, while $\rho$ is the perturbation magnitude and $w^{\mathrm{old}}_i$ the previous round’s global model. FedLESAM improves alignment with centralized SAM and offers efficient, scalable performance improvements in heterogeneous FL settings.

Abstract

In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training. To overcome this challenge, we propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side as the difference between global models received in the previous active and current rounds. Besides the improved quality, FedLESAM also speed up federated SAM-based approaches since it only performs once backpropagation in each iteration. Theoretically, we prove a slightly tighter bound than its original FedSAM by ensuring consistent perturbation. Empirically, we conduct comprehensive experiments on four federated benchmark datasets under three partition strategies to demonstrate the superior performance and efficiency of FedLESAM.

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

TL;DR

and

denote the global and client losses, while

is the perturbation magnitude and

the previous round’s global model. FedLESAM improves alignment with centralized SAM and offers efficient, scalable performance improvements in heterogeneous FL settings.

Abstract

Paper Structure (39 sections, 10 theorems, 52 equations, 6 figures, 9 tables, 4 algorithms)

This paper contains 39 sections, 10 theorems, 52 equations, 6 figures, 9 tables, 4 algorithms.

Introduction
Preliminaries
Basic Notations
Sharpness and SAM
Federated Learning via FedAvg
Rethink SAM in FL
When SAM Works in FL and Recent Works
Verification and Motivation.
Method: FedLESAM
Efficiently Estimate Global Perturbation on Client
Total Framework
Enhanced Variants
Theoretical Analysis
Basic Assumptions
Convergence Results and Trade-off
...and 24 more sections

Key Result

Theorem 1

Let Assumption asm:smooth_var-asm:grad_differ hold, with an independent $\rho$ under full participation, if choosing $\eta_\mathrm{l}=\frac{1}{\sqrt{T} E L}$ and $\eta_\mathrm{g}=\sqrt{E N}$, the sequence of $\{w^t\}$ generated by FedSAM and FedLESAM in Algorithm alg:fedgesam satisfies: where $C \geq (\frac{1}{2}-30 E^2 L^2 \eta_\mathrm{l}^2) \geq 0$. For FedSAM, $\Delta=\frac{120L^2\rho^2}{CET^2

Figures (6)

Figure 1: Figures \ref{['fig:intro_1']}-\ref{['fig:intro_3']} illustrate the loss surface for centralized training and federated training under Dirichlet distributions with coefficients of 0.6 and 0.06. Figure \ref{['fig:intro_4']} depicts the local update process of FedSAM, including calculating perturbation based on client data and updating the local model using the gradient of the model after perturbation. Figure \ref{['fig:intro_5']} highlights the sharpness minimizing conflicts due to discrepancies between local and global loss landscapes caused by data heterogeneity. Figure \ref{['fig:intro_6']} demonstrates our locally estimating global perturbation (opposite direction of red arrow) via global update (opposite direction of black arrow).
Figure 2: Illustration of perturbation drift (left) ranged from 0 to 1 and global sharpness (right) during federated training. The experiment was conducted on CIFAR10 under the Dirichlet distribution with coefficient of 0.1 with 100 clients and active ratio of 10%.
Figure 3: Visualization of the global loss surface on CIFAR10 under Dirichlet distribution with coefficient 0.1 of FedAvg, FedSAM, FedGAMMA, FedSMOO and our FedLESAM-D. We divide the dataset into 100 clients and in each round 10% clients are active.
Figure 4: Ablation study on $\log\frac{\rho}{\eta_\mathrm{l}}$, where $\eta_\mathrm{l}$ is local learning rate. From left to right, we show the test accuracy on CIFAR10 and CIFAR100 ($\eta_\mathrm{l}=0.1$) and the averaged test accuracy of all target domains on OfficeHome and DomainNet ($\eta_\mathrm{l}=0.001$) with different $\rho$.
Figure 5: Heatmap of data distribution of CIFAR10 and CIFAR100 under Dirichelet distribution with coefficients $\beta$ of $0.6$ and $0.1$. The two datasets are divided into 100 and 200 clients.
...and 1 more figures

Theorems & Definitions (15)

Theorem 1
Theorem 2
Lemma 1: Intermediate results
proof
Lemma 2: Bounded perturbation difference
Lemma 3: Bounded variance of gradient difference after perturbation
proof
Lemma 4: Bounded iteration difference
Lemma 5: Bounded update difference
Lemma 6: Descent Lemma
...and 5 more

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

TL;DR

Abstract

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (15)