Table of Contents
Fetching ...

Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation

Yongle Li, Bo Liu, Sheng Huang, ZHeng ZHang, Xiaotong Yuan, Richang Hong

TL;DR

This paper presents a novel LoRA parameter factorization by introducing a small-size dense matrix, which can significantly reduce the communication cost and achieve comparable empirical performance than transferring the low-rank parameter matrix used by existing methods.

Abstract

In federated learning, fine-tuning pre-trained foundation models poses significant challenges, particularly regarding high communication cost and suboptimal model performance due to data heterogeneity between the clients. To address these issues, this paper introduces communication-efficient federated LoRA adaption (CE-LoRA), a method that employs a tri-factorization low-rank adaptation approach with personalized model parameter aggregation. We first presents a novel LoRA parameter factorization by introducing a small-size dense matrix, which can significantly reduce the communication cost and achieve comparable empirical performance than transferring the low-rank parameter matrix used by existing methods. Without violating data privacy, the server considers the client similarity in both training dataset and model parameter space, and learns personalized weights for model aggregation. Our experiments on various LLM and VLM fine-tuning tasks demonstrate that CE-LoRA not only significantly reduces communication overhead but also improves performance under not independently and identically distributed data conditions. In addition, CE-LoRA improves data privacy protection, effectively mitigating gradient-based data reconstruction attacks.

Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation

TL;DR

This paper presents a novel LoRA parameter factorization by introducing a small-size dense matrix, which can significantly reduce the communication cost and achieve comparable empirical performance than transferring the low-rank parameter matrix used by existing methods.

Abstract

In federated learning, fine-tuning pre-trained foundation models poses significant challenges, particularly regarding high communication cost and suboptimal model performance due to data heterogeneity between the clients. To address these issues, this paper introduces communication-efficient federated LoRA adaption (CE-LoRA), a method that employs a tri-factorization low-rank adaptation approach with personalized model parameter aggregation. We first presents a novel LoRA parameter factorization by introducing a small-size dense matrix, which can significantly reduce the communication cost and achieve comparable empirical performance than transferring the low-rank parameter matrix used by existing methods. Without violating data privacy, the server considers the client similarity in both training dataset and model parameter space, and learns personalized weights for model aggregation. Our experiments on various LLM and VLM fine-tuning tasks demonstrate that CE-LoRA not only significantly reduces communication overhead but also improves performance under not independently and identically distributed data conditions. In addition, CE-LoRA improves data privacy protection, effectively mitigating gradient-based data reconstruction attacks.

Paper Structure

This paper contains 22 sections, 15 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Comparison of communication cost for three federated learning-based fine-tuning methods for pretrained models: transferring all model parameters, LoRA-based FL fine-tuning DBLP:conf/acl/ZhangYDWYQX23, and the proposed CE-LoRA method. The vertical axis is the parameter number needs to be transferred per iteration, on a logarithmic scale. CE-LoRA can reduce communication costs by several hundred times compared to the efficient fine-tuning of LoRA.
  • Figure 2: Our CE-LoRA framework consists of two main components: local fine-tuning and server global aggregation. In the local fine-tuning stage, we freeze the pretrained foundation model and fine-tune it using the proposed CE-LoRA method. In addition to the low-rank matrices $A\in \mathbb{R}^{d\times r}, B\in \mathbb{R}^{r\times k}, r\ll \min(k,d)$, we introduce a full-rank matrix $C\in \mathbb{R}^{r\times r}$ that serves as the parameter matrix transmitted between clients and server. After receiving matrices $\{C_{i}^{t}\}_{i=1}^n$, the server calculates the similarity between these parameter matrices using the proposed similarity metric that jointly considers data distribution and model similarity. These client pair-wise similarities are used to derive model aggregation weights for updating $\{\bar{C}_{i}^{t}\}_{i=1}^n$.
  • Figure 3: Illustration of the LoRA Triple Factorization. The pre-trained model is frozen during training, while the trainable LoRA is decomposed into $A \in \mathbb{R}^{r \times d}$, $B \in \mathbb{R}^{k \times r}$, and $C \in \mathbb{R}^{r \times r}$, where $r \ll \min(k, d)$. During federated learning, only $C$ is transmitted for model parameter aggregation.
  • Figure 4: Performance comparison of worst-performing client and best-performing client.
  • Figure 5: The comparison of the data reconstruction attack on the dataset using full PFM fine-tuning, FedPETuning, FFA-LoRA and CE-LoRA.
  • ...and 5 more figures