Table of Contents
Fetching ...

Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

Minghui Chen, Meirui Jiang, Xin Zhang, Qi Dou, Zehua Wang, Xiaoxiao Li

TL;DR

This work proposes an innovative model interpolation-based local training technique called ``Local Superior Soups,'' which enhances local training across different clients, encouraging the exploration of a connected low-loss basin within a few communication rounds through regularized model interpolation.

Abstract

Federated learning (FL) is a learning paradigm that enables collaborative training of models using decentralized data. Recently, the utilization of pre-trained weight initialization in FL has been demonstrated to effectively improve model performance. However, the evolving complexity of current pre-trained models, characterized by a substantial increase in parameters, markedly intensifies the challenges associated with communication rounds required for their adaptation to FL. To address these communication cost issues and increase the performance of pre-trained model adaptation in FL, we propose an innovative model interpolation-based local training technique called ``Local Superior Soups.'' Our method enhances local training across different clients, encouraging the exploration of a connected low-loss basin within a few communication rounds through regularized model interpolation. This approach acts as a catalyst for the seamless adaptation of pre-trained models in in FL. We demonstrated its effectiveness and efficiency across diverse widely-used FL datasets. Our code is available at \href{https://github.com/ubc-tea/Local-Superior-Soups}{https://github.com/ubc-tea/Local-Superior-Soups}.

Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

TL;DR

This work proposes an innovative model interpolation-based local training technique called ``Local Superior Soups,'' which enhances local training across different clients, encouraging the exploration of a connected low-loss basin within a few communication rounds through regularized model interpolation.

Abstract

Federated learning (FL) is a learning paradigm that enables collaborative training of models using decentralized data. Recently, the utilization of pre-trained weight initialization in FL has been demonstrated to effectively improve model performance. However, the evolving complexity of current pre-trained models, characterized by a substantial increase in parameters, markedly intensifies the challenges associated with communication rounds required for their adaptation to FL. To address these communication cost issues and increase the performance of pre-trained model adaptation in FL, we propose an innovative model interpolation-based local training technique called ``Local Superior Soups.'' Our method enhances local training across different clients, encouraging the exploration of a connected low-loss basin within a few communication rounds through regularized model interpolation. This approach acts as a catalyst for the seamless adaptation of pre-trained models in in FL. We demonstrated its effectiveness and efficiency across diverse widely-used FL datasets. Our code is available at \href{https://github.com/ubc-tea/Local-Superior-Soups}{https://github.com/ubc-tea/Local-Superior-Soups}.

Paper Structure

This paper contains 36 sections, 2 theorems, 25 equations, 10 figures, 11 tables, 1 algorithm.

Key Result

Theorem 3.1

Under Convexity and Smoothness Assumption on $\beta$-smooth loss function, Bounded Variance of Stochastic Gradient and Bounded Variance of Local and Global Gradient assumptions, when the client learning rate is chosen properly as, $\eta = \min\{\frac{1}{4\beta}, \frac{M^{\frac{1}{2}}d}{\tau^{\frac{1

Figures (10)

  • Figure 1: Illustration on isolated (left) and connected low-loss valley with larger regions in dark red (right).
  • Figure 2: Illustration on diversity (left) and affinity (right) regularization.
  • Figure 3: Convergence comparison of our proposed LSS with FedAvg. LSS achieves high accuracy much earlier (around 6 to 8 rounds) than FedAvg, which takes hundreds of communication rounds.
  • Figure 4: Evaluation on ViT fine-tuned with LoRA (Digit5 dataset).
  • Figure 5: Ablation on the affinity & diversity.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Theorem 3.1: Convergence Rate for Convex Local Functions with Affinity and Diversity Constraint
  • Proposition 3.2
  • proof
  • Definition C.1: Bias
  • Definition C.2: Variance
  • Definition C.3: Covariance
  • Definition C.4: Locality