Table of Contents
Fetching ...

Communication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation

Le-Tuan Nguyen, Minh-Duong Nguyen, Seon-Geun Jeong, Dung D. Le, Quoc-Viet Pham

TL;DR

FLoRA-NA tackles aggregation errors in Federated Low-Rank Adaptation by learning server-side coefficients to form surrogate global matrices $\bar{B}^{(i)}$ and $\bar{A}^{(i)}$ so that $\bar{B}^{(i)}\bar{A}^{(i)}$ closely matches the ideal $\frac{1}{U}\sum_u B_u^{(i)}A_u^{(i)}$ without increasing communication. It introduces a concise optimization over $P$ and $Q$ to compute $\bar{B}$ and $\bar{A}$, offering near-accurate aggregation with minimal overhead and extending to HiRA/DoRA variants. The paper provides a convergence analysis bounding the impact of aggregation divergence via $\varrho$, showing that reducing this divergence yields improved convergence in both convex and non-convex settings, with $\varrho_{\text{FLoRA-NA}} \le \epsilon$. Empirically, FLoRA-NA delivers state-of-the-art global generalization on NLP, mathematical reasoning, and code tasks while maintaining FedLoRA’s lightweight communication and demonstrating robustness to data heterogeneity and varying client counts, and compatibility with compression techniques. Overall, FLoRA-NA offers a scalable, communication-efficient approach that bridges local personalization and global generalization for federated fine-tuning of foundation models.

Abstract

With the rapid emergence of foundation models and the increasing need for fine-tuning across distributed environments, Federated Low-Rank Adaptation (FedLoRA) has recently gained significant attention. Despite enormous potential, current FedLoRA methods face notable challenges due to inexact updates. Existing approaches have attempted to mitigate this issue, but they often introduce a \emph{local-global generalization gap} and incur \emph{substantial communication overhead}, limiting their scalability and effectiveness. To address these limitations, we propose \textbf{F}ederated \textbf{Lo}w-\textbf{R}ank \textbf{A}ggregation with \textbf{N}early \textbf{A}ccurate Estimation (FLoRA-NA). FLoRA-NA leverages the local LoRA matrices on the server to estimate the aggregated matrices $\hat{A}$ and $\hat{B}$, which are then distributed to clients for local updates. This surrogated aggregated matrices minimizes the divergence between ideal $\nabla \Bar{W} = \sum^{U}_{u=1}B_u A_u$ and practical updates $\nabla \hat{W} = \hat{B}\hat{A}$ without adding communication cost beyond vanilla FedLoRA. By doing so, FLoRA-NA achieves communication efficiency and bridges the gap between local personalization and global generalization, addressing a key limitation of prior personalized FedLoRA approaches. We conduct extensive evaluations across diverse tasks, including natural language understanding, mathematical reasoning, and code-solving ability using various foundation models. Experimental results consistently demonstrate that FLoRA-NA achieves state-of-the-art global performance while maintaining low communication overhead.

Communication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation

TL;DR

FLoRA-NA tackles aggregation errors in Federated Low-Rank Adaptation by learning server-side coefficients to form surrogate global matrices and so that closely matches the ideal without increasing communication. It introduces a concise optimization over and to compute and , offering near-accurate aggregation with minimal overhead and extending to HiRA/DoRA variants. The paper provides a convergence analysis bounding the impact of aggregation divergence via , showing that reducing this divergence yields improved convergence in both convex and non-convex settings, with . Empirically, FLoRA-NA delivers state-of-the-art global generalization on NLP, mathematical reasoning, and code tasks while maintaining FedLoRA’s lightweight communication and demonstrating robustness to data heterogeneity and varying client counts, and compatibility with compression techniques. Overall, FLoRA-NA offers a scalable, communication-efficient approach that bridges local personalization and global generalization for federated fine-tuning of foundation models.

Abstract

With the rapid emergence of foundation models and the increasing need for fine-tuning across distributed environments, Federated Low-Rank Adaptation (FedLoRA) has recently gained significant attention. Despite enormous potential, current FedLoRA methods face notable challenges due to inexact updates. Existing approaches have attempted to mitigate this issue, but they often introduce a \emph{local-global generalization gap} and incur \emph{substantial communication overhead}, limiting their scalability and effectiveness. To address these limitations, we propose \textbf{F}ederated \textbf{Lo}w-\textbf{R}ank \textbf{A}ggregation with \textbf{N}early \textbf{A}ccurate Estimation (FLoRA-NA). FLoRA-NA leverages the local LoRA matrices on the server to estimate the aggregated matrices and , which are then distributed to clients for local updates. This surrogated aggregated matrices minimizes the divergence between ideal and practical updates without adding communication cost beyond vanilla FedLoRA. By doing so, FLoRA-NA achieves communication efficiency and bridges the gap between local personalization and global generalization, addressing a key limitation of prior personalized FedLoRA approaches. We conduct extensive evaluations across diverse tasks, including natural language understanding, mathematical reasoning, and code-solving ability using various foundation models. Experimental results consistently demonstrate that FLoRA-NA achieves state-of-the-art global performance while maintaining low communication overhead.

Paper Structure

This paper contains 51 sections, 12 theorems, 50 equations, 4 figures, 11 tables, 1 algorithm.

Key Result

Corollary 1

Since $P, Q \in {\mathbb{R}}^{U \times 1}$ and $U \ll k \times d$, the optimization involving the two learnable coefficients $P$ and $Q$ is significantly simpler compared to directly minimizing over the LoRA matrices $A$ and $B$. Consequently, the proposed approach achieves near-accurate aggregation

Figures (4)

  • Figure 1: We evaluate leading FedLoRA methods on MNLI dataset and observe a notable gap between local and global test accuracy. Our proposed method FLoRA-NA shows state-of-art robustness by mitigating inter-client divergence throughout the learning process, leading to a reduced local-global generalization gap.
  • Figure 2: Comparison of the normalized Frobenius norm of divergence between the gradient obtained from the ideal update and that from the approximate update under the naive FedAvg strategy with full-parameter, using FedIT and the proposed FLoRA-NA method on MNLI dataset.
  • Figure 3: Performance w.r.t data heterogeneity $\alpha$ for four datasets.
  • Figure 4: Comparison of the layer-wise normalized Frobenius norm of divergence between the gradient obtained from the ideal update and that from the approximate update under the naive FedAvg strategy with full-parameter, using FedIT and the proposed FLoRA-NA method. This experiment is conducted on MNLI dataset.

Theorems & Definitions (14)

  • Corollary 1: Computation Efficiency
  • Corollary 2: Communication Efficiency
  • Definition 1
  • Theorem 1
  • Corollary 3
  • Theorem 2
  • Corollary 4
  • Lemma 1: Jensen's Inequality
  • Lemma 2: Gradient bound via Bregman divergence
  • Lemma 3: Co-coercivity of convex smooth function
  • ...and 4 more