Table of Contents
Fetching ...

FedGuCci: Making Local Models More Connected in Landscape for Federated Learning

Zexi Li, Jie Lin, Zhiqi Li, Didi Zhu, Tao Shen, Tao Lin, Chao Wu, Nicholas D. Lane

TL;DR

This work tackles the FL generalization gap by introducing a connectivity-centric view that leverages linear mode connectivity (LMC). It introduces FedGuCci, an FL method that strengthens group connectivity by enforcing alignment between local models and a set of fixed anchor models, and FedGuCci+, which adds bias-reduction (logit calibration) and flattening (sharpness-aware minimization) to further align local loss landscapes. The authors provide theoretical support for the transitivity of LMC and group connectivity under anchor-based training, and they demonstrate substantial generalization gains across four vision datasets and NLP benchmarks, including pretrained backbones. The proposed approach is compatible with existing FL techniques (e.g., FedSAM) and exhibits robustness to data heterogeneity and varying participation, with practical implications for scalable, privacy-preserving collaborative learning.

Abstract

Federated learning (FL) involves multiple heterogeneous clients collaboratively training a global model via iterative local updates and model fusion. The generalization of FL's global model has a large gap compared with centralized training, which is its bottleneck for broader applications. In this paper, we study and improve FL's generalization through a fundamental ``connectivity'' perspective, which means how the local models are connected in the parameter region and fused into a generalized global model. The term ``connectivity'' is derived from linear mode connectivity (LMC), studying the interpolated loss landscape of two different solutions (e.g., modes) of neural networks. Bridging the gap between LMC and FL, in this paper, we leverage fixed anchor models to empirically and theoretically study the transitivity property of connectivity from two models (LMC) to a group of models (model fusion in FL). Based on the findings, we propose FedGuCci(+), improving group connectivity for better generalization. It is shown that our methods can boost the generalization of FL under client heterogeneity across various tasks (4 CV datasets and 6 NLP datasets) and model architectures (e.g., ViTs and PLMs). The code is available here: \href{https://github.com/ZexiLee/fedgucci}{\faGithub~FedGuCci Codebase}.

FedGuCci: Making Local Models More Connected in Landscape for Federated Learning

TL;DR

This work tackles the FL generalization gap by introducing a connectivity-centric view that leverages linear mode connectivity (LMC). It introduces FedGuCci, an FL method that strengthens group connectivity by enforcing alignment between local models and a set of fixed anchor models, and FedGuCci+, which adds bias-reduction (logit calibration) and flattening (sharpness-aware minimization) to further align local loss landscapes. The authors provide theoretical support for the transitivity of LMC and group connectivity under anchor-based training, and they demonstrate substantial generalization gains across four vision datasets and NLP benchmarks, including pretrained backbones. The proposed approach is compatible with existing FL techniques (e.g., FedSAM) and exhibits robustness to data heterogeneity and varying participation, with practical implications for scalable, privacy-preserving collaborative learning.

Abstract

Federated learning (FL) involves multiple heterogeneous clients collaboratively training a global model via iterative local updates and model fusion. The generalization of FL's global model has a large gap compared with centralized training, which is its bottleneck for broader applications. In this paper, we study and improve FL's generalization through a fundamental ``connectivity'' perspective, which means how the local models are connected in the parameter region and fused into a generalized global model. The term ``connectivity'' is derived from linear mode connectivity (LMC), studying the interpolated loss landscape of two different solutions (e.g., modes) of neural networks. Bridging the gap between LMC and FL, in this paper, we leverage fixed anchor models to empirically and theoretically study the transitivity property of connectivity from two models (LMC) to a group of models (model fusion in FL). Based on the findings, we propose FedGuCci(+), improving group connectivity for better generalization. It is shown that our methods can boost the generalization of FL under client heterogeneity across various tasks (4 CV datasets and 6 NLP datasets) and model architectures (e.g., ViTs and PLMs). The code is available here: \href{https://github.com/ZexiLee/fedgucci}{\faGithub~FedGuCci Codebase}.
Paper Structure (26 sections, 6 theorems, 14 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 26 sections, 6 theorems, 14 equations, 7 figures, 10 tables, 1 algorithm.

Key Result

Lemma 3.3

Set the uniform and bounded domain for network $\mathbf{w}$ as $\mathcal{E}_\epsilon=\{\mathbf{w}\in \Omega |\mathcal{L}(\mathbf{w})<\epsilon\}$. Define a random event $D_\epsilon({\mathbf{w}_{\text{anc}}^*})$ as $D_\epsilon({\mathbf{w}_{\text{anc}}^*})=\{\mathbf{w} \in \mathcal{E}_\epsilon |\forall where $d_\epsilon=\left|\mathcal{E}_\epsilon\right|^\frac{1}{S}$ represents the average diameter of

Figures (7)

  • Figure 1: Illustration on transitivity of linear mode connectivity.Left: vanilla training, where models have high barriers in LMC. Right: transitivity of LMC. Models $\mathbf{w}_1$ and $\mathbf{w}_2$ are independently trained, and they are all learned to have good LMC with anchor model $\mathbf{w}_\text{anc}^*$. At the end of the training, models $\mathbf{w}_1$ and $\mathbf{w}_2$ have improved LMC, showing the transitivity of LMC.
  • Figure 2: Linear mode connectivity landscapes of test accuracy, showcasing the transitivity. The accuracy barrier is shown as the maximal accuracy drop along the landscape. (a) and (c): LMC between one trained model and the anchor model, and the barrier is eliminated for connectivity loss. (b) and (d): LMC between two trained models, connectivity loss has the lower barriers, showing the transitivity of LMC. CIFAR-10 is used.
  • Figure 3: Test loss landscapes of three trained models w/ and w/o connectivity loss. Visualization as in garipov2018loss with $\mathbf{w}_1^*$ at the origin. $\mathbf{w}_1^*, \mathbf{w}_2^*, \mathbf{w}_3^*$ are marked as the black dots. Left: vanilla CE loss. Right: independently training three models with improved LMC between the same anchor model. From the right figure, it is validated that group connectivity is improved that the three models fall into a more connected low-loss region.
  • Figure 4: Accuracy barriers (the lower, the better) of group connectivity by varying numbers of trained models $K$. There is only one anchor model for all settings. It can be seen that generally, larger $K$ will cause larger barriers, but connectivity loss can still reduce them, reflecting that the transitivity of LMC can improve group connectivity. CIFAR-10 is used.
  • Figure 5: Illustration of how FedGuCci+ aligns the local loss landscapes.(a): Due to data heterogeneity, clients have different local loss landscapes. (b): Introducing logit calibration or other FL bias reduction techniques can align the learning objectives. (c): Introducing sharpness-aware minimization can make the landscapes flatter, and as a result, the overlapping regions increase.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Definition 2.1
  • Lemma 3.3
  • Remark 3.4
  • Theorem 3.5
  • Definition 3.6
  • Definition 3.7
  • Theorem 3.8
  • Lemma B.1
  • Theorem B.2
  • Theorem B.3