Table of Contents
Fetching ...

Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning

Guangyu Sun, Umar Khalid, Matias Mendieta, Pu Wang, Chen Chen

TL;DR

This work investigates the use of parameter-efficient fine-tuning in federated learning and introduces a new framework: FedPEFT, which systemically evaluates the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings.

Abstract

Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices. In the typical FL paradigm (e.g., FedAvg), model weights are sent to and from the server each round to participating clients. Recently, the use of small pre-trained models has been shown to be effective in federated learning optimization and improving convergence. However, recent state-of-the-art pre-trained models are getting more capable but also have more parameters, known as the "Foundation Models." In conventional FL, sharing the enormous model weights can quickly put a massive communication burden on the system, especially if more capable models are employed. Can we find a solution to enable those strong and readily available pre-trained models in FL to achieve excellent performance while simultaneously reducing the communication burden? To this end, we investigate the use of parameter-efficient fine-tuning in federated learning and thus introduce a new framework: FedPEFT. Specifically, we systemically evaluate the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings. By only locally tuning and globally sharing a small portion of the model weights, significant reductions in the total communication overhead can be achieved while maintaining competitive or even better performance in a wide range of federated learning scenarios, providing insight into a new paradigm for practical and effective federated systems.

Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning

TL;DR

This work investigates the use of parameter-efficient fine-tuning in federated learning and introduces a new framework: FedPEFT, which systemically evaluates the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings.

Abstract

Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices. In the typical FL paradigm (e.g., FedAvg), model weights are sent to and from the server each round to participating clients. Recently, the use of small pre-trained models has been shown to be effective in federated learning optimization and improving convergence. However, recent state-of-the-art pre-trained models are getting more capable but also have more parameters, known as the "Foundation Models." In conventional FL, sharing the enormous model weights can quickly put a massive communication burden on the system, especially if more capable models are employed. Can we find a solution to enable those strong and readily available pre-trained models in FL to achieve excellent performance while simultaneously reducing the communication burden? To this end, we investigate the use of parameter-efficient fine-tuning in federated learning and thus introduce a new framework: FedPEFT. Specifically, we systemically evaluate the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings. By only locally tuning and globally sharing a small portion of the model weights, significant reductions in the total communication overhead can be achieved while maintaining competitive or even better performance in a wide range of federated learning scenarios, providing insight into a new paradigm for practical and effective federated systems.
Paper Structure (15 sections, 1 theorem, 6 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 1 theorem, 6 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $F$ satisfies Assumptions ass:minimum-ass:bounded_grad. Then where $P = \frac{\beta}{2}(\|{\bm{\theta}}^{(T)}-{\bm{\theta}}^{(0)}\|^2+\|\delta^{(T)}-\delta^{(0)}\|^2)$.

Figures (4)

  • Figure 1: Process in a federated learning communication round with $\mathbf{M}$ participating clients. We use ViT-Base as an instance to analyze the communication costs. (a) Conventional federated learning framework, where the entire model will be sent during the communication. (b) FedPEFT, which is our proposed parameter-efficient framework for federated learning.
  • Figure 2: Methods to fine-tune each layer in a pre-trained backbone, where $h$ means the input, $\phi$ means the pre-trained layer, and $\phi_w, \phi_b$ mean its weight and bias parameters, respectively.
  • Figure 3: Server accuracy given the total communication budget. The communication cost is computed with 4B/parameter, and the max number of communication rounds is 50. The number in the bracket next to the method indicates the number of participating clients $m$. The transparency of the line indicates the ratio between $m$ and total client number $N=64$. The horizontal dashed line shows a target accuracy of $85\%$.
  • Figure 4: Visualization and analysis of domain gap.

Theorems & Definitions (2)

  • Remark 1
  • Theorem 1