Table of Contents
Fetching ...

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Jieming Bian, Lei Wang, Letian Zhang, Jie Xu

TL;DR

This paper tackles the high cost of fine-tuning foundation models by combining LoRA with Federated Learning to preserve privacy and reduce parameters. It identifies two key issues in federated LoRA: server-side aggregation bias and client-side initialization lag, and proposes LoRA-FAIR, which refines server-side aggregation with a residual correction $\Delta \mathbf{B}$ to form $\bar{\mathbf{B}}'$, while applying Avg-Initial initialization to preserve cross-round information. The approach achieves superior performance and favorable communication costs on ViT and MLP-Mixer across DomainNet and NICO++ under feature non-IID settings, outperforming state-of-the-art baselines like FedIT, FFA-LoRA, FLoRA, and FlexLoRA. Theoretical guarantees are provided showing a contraction in the residual aggregation error and improved convergence bounds in non-IID settings, supporting the practical benefit of LoRA-FAIR for scalable federated fine-tuning of large models.

Abstract

Foundation models (FMs) achieve strong performance across diverse tasks with task-specific fine-tuning, yet full parameter fine-tuning is often computationally prohibitive for large models. Parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaptation (LoRA) reduce this cost by introducing low-rank matrices for tuning fewer parameters. While LoRA allows for efficient fine-tuning, it requires significant data for adaptation, making Federated Learning (FL) an appealing solution due to its privacy-preserving collaborative framework. However, combining LoRA with FL introduces two key challenges: the \textbf{Server-Side Aggregation Bias}, where server-side averaging of LoRA matrices diverges from the ideal global update, and the \textbf{Client-Side Initialization Lag}, emphasizing the need for consistent initialization across rounds. Existing approaches address these challenges individually, limiting their effectiveness. We propose LoRA-FAIR, a novel method that tackles both issues by introducing a correction term on the server, enhancing aggregation efficiency and accuracy. LoRA-FAIR maintains computational and communication efficiency, yielding superior performance over state-of-the-art methods. Experimental results on ViT and MLP-Mixer models across large-scale datasets demonstrate that LoRA-FAIR consistently achieves performance improvements in FL settings.

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

TL;DR

This paper tackles the high cost of fine-tuning foundation models by combining LoRA with Federated Learning to preserve privacy and reduce parameters. It identifies two key issues in federated LoRA: server-side aggregation bias and client-side initialization lag, and proposes LoRA-FAIR, which refines server-side aggregation with a residual correction to form , while applying Avg-Initial initialization to preserve cross-round information. The approach achieves superior performance and favorable communication costs on ViT and MLP-Mixer across DomainNet and NICO++ under feature non-IID settings, outperforming state-of-the-art baselines like FedIT, FFA-LoRA, FLoRA, and FlexLoRA. Theoretical guarantees are provided showing a contraction in the residual aggregation error and improved convergence bounds in non-IID settings, supporting the practical benefit of LoRA-FAIR for scalable federated fine-tuning of large models.

Abstract

Foundation models (FMs) achieve strong performance across diverse tasks with task-specific fine-tuning, yet full parameter fine-tuning is often computationally prohibitive for large models. Parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaptation (LoRA) reduce this cost by introducing low-rank matrices for tuning fewer parameters. While LoRA allows for efficient fine-tuning, it requires significant data for adaptation, making Federated Learning (FL) an appealing solution due to its privacy-preserving collaborative framework. However, combining LoRA with FL introduces two key challenges: the \textbf{Server-Side Aggregation Bias}, where server-side averaging of LoRA matrices diverges from the ideal global update, and the \textbf{Client-Side Initialization Lag}, emphasizing the need for consistent initialization across rounds. Existing approaches address these challenges individually, limiting their effectiveness. We propose LoRA-FAIR, a novel method that tackles both issues by introducing a correction term on the server, enhancing aggregation efficiency and accuracy. LoRA-FAIR maintains computational and communication efficiency, yielding superior performance over state-of-the-art methods. Experimental results on ViT and MLP-Mixer models across large-scale datasets demonstrate that LoRA-FAIR consistently achieves performance improvements in FL settings.

Paper Structure

This paper contains 23 sections, 3 theorems, 35 equations, 9 figures, 7 tables.

Key Result

Theorem 11.1

For analytical tractability, we consider the case where the similarity metric S is based on the Frobenius norm. The residual correction term $\Delta\mathbf{B}$ obtained by minimizing Equation (8) guarantees that $(\bar{\mathbf{B}}+\Delta\mathbf{B})\bar{\mathbf{A}}$ approaches the ideal global update here $\sigma_{\min}(\bar{\mathbf{A}})$ is the smallest non-zero singular value of $\bar{\mathbf{A}}

Figures (9)

  • Figure 1: Illustration of LoRA-FAIR. Instead of directly averaging the local LoRA modules $\mathbf{A}_k$ and $\mathbf{B}_k$ collected from each client $k$ on the server side and sending the averaged LoRA modules $\mathbf{\Bar{A}}$ and $\mathbf{\Bar{B}}$ back to clients, LoRA-FAIR reconstructs the ideal global update $\mathbf{\Delta W}$ using \ref{['eq:ideal_update']}, finds the residual LoRA module $\mathbf{\Delta B}$ using \ref{['eq:optimization_problem']}, and replaces $\mathbf{\Bar{B}}$ with the corrected LoRA modules $\mathbf{\Bar{B}'} = \mathbf{\Bar{B}} + \mathbf{\Delta B}$. See details in \ref{['sec:method']}.
  • Figure 2: Comparison of two aggregation strategies: AvgToMul and MulToAvg.AvgToMul averages the LoRA matrices $\mathbf{A}_k$ and $\mathbf{B}_k$ from clients, then multiplies the averages to obtain the approximate global update $\mathbf{\Delta W}'$ using \ref{['eq:approx_update']}. MulToAvg first multiplies each client’s matrices (yielding $\mathbf{B}_k \mathbf{A}_k$) and then averages these products for the true global update $\mathbf{\Delta W}$ using \ref{['eq:ideal_update']}. While AvgToMul is communication-efficient, MulToAvg better captures the intended global model update. See details in \ref{['sec:challenge1']}.
  • Figure 3: Comparison of three initialization strategies: Avg-Initial, Re-Initial, Local-Initial. The Avg-Initial method is the most effective as it balances continuity and unification across clients, mitigating client initialization lag and promoting better performance. For more details, refer to \ref{['sec:challenge2']}.
  • Figure 4: Communication cost comparison. LoRA-FAIR matches the communication cost of FedIT and FlexLoRA and avoids FLoRA's high overhead. Details in \ref{['sec:exp']}.
  • Figure 5: Impact of Regularization Weight $\lambda$. With $\lambda = 0$, LoRA-FAIR results in the lowest performance, underscoring the importance of this term. Details in \ref{['sec:ablation']}.
  • ...and 4 more figures

Theorems & Definitions (5)

  • Theorem 11.1
  • proof
  • Corollary 11.2
  • Theorem 11.3: Convergence of Federated LoRA Fine-Tuning
  • proof