Table of Contents
Fetching ...

FedHL: Federated Learning for Heterogeneous Low-Rank Adaptation via Unbiased Aggregation

Zihao Peng, Jiandian Zeng, Boyuan Li, Guo Li, Shengbo Chen, Tian Wang

TL;DR

This work identifies truncation bias and gradient drift as critical obstacles to convergence when applying heterogeneous LoRA in federated fine-tuning. It introduces FedHL, a framework that uses the full-rank global model as a baseline for unbiased aggregation and derives optimal, round-specific aggregation weights to minimize gradient drift, achieving a theoretical convergence rate of $ ext{O}(1/ oot 2 ext{T})$. Empirically, FedHL yields 1–3% improvements over state-of-the-art methods across cross-silo and cross-device settings and demonstrates robust performance under varied LoRA ranks and participation. The results offer a principled approach to resilient federated fine-tuning of foundation models with heterogeneous LoRA configurations, with potential impact on privacy-preserving, communication-efficient large-scale NLP and multimodal learning.

Abstract

Federated Learning (FL) facilitates the fine-tuning of Foundation Models (FMs) using distributed data sources, with Low-Rank Adaptation (LoRA) gaining popularity due to its low communication costs and strong performance. While recent work acknowledges the benefits of heterogeneous LoRA in FL and introduces flexible algorithms to support its implementation, our theoretical analysis reveals a critical gap: existing methods lack formal convergence guarantees due to parameter truncation and biased gradient updates. Specifically, adapting client-specific LoRA ranks necessitates truncating global parameters, which introduces inherent truncation errors and leads to subsequent inaccurate gradient updates that accumulate over training rounds, ultimately degrading performance. To address the above issues, we propose \textbf{FedHL}, a simple yet effective \textbf{Fed}erated Learning framework tailored for \textbf{H}eterogeneous \textbf{L}oRA. By leveraging the full-rank global model as a calibrated aggregation basis, FedHL eliminates the direct truncation bias from initial alignment with client-specific ranks. Furthermore, we derive the theoretically optimal aggregation weights by minimizing the gradient drift term in the convergence upper bound. Our analysis shows that FedHL guarantees $\mathcal{O}(1/\sqrt{T})$ convergence rate, and experiments on multiple real-world datasets demonstrate a 1-3\% improvement over several state-of-the-art methods.

FedHL: Federated Learning for Heterogeneous Low-Rank Adaptation via Unbiased Aggregation

TL;DR

This work identifies truncation bias and gradient drift as critical obstacles to convergence when applying heterogeneous LoRA in federated fine-tuning. It introduces FedHL, a framework that uses the full-rank global model as a baseline for unbiased aggregation and derives optimal, round-specific aggregation weights to minimize gradient drift, achieving a theoretical convergence rate of . Empirically, FedHL yields 1–3% improvements over state-of-the-art methods across cross-silo and cross-device settings and demonstrates robust performance under varied LoRA ranks and participation. The results offer a principled approach to resilient federated fine-tuning of foundation models with heterogeneous LoRA configurations, with potential impact on privacy-preserving, communication-efficient large-scale NLP and multimodal learning.

Abstract

Federated Learning (FL) facilitates the fine-tuning of Foundation Models (FMs) using distributed data sources, with Low-Rank Adaptation (LoRA) gaining popularity due to its low communication costs and strong performance. While recent work acknowledges the benefits of heterogeneous LoRA in FL and introduces flexible algorithms to support its implementation, our theoretical analysis reveals a critical gap: existing methods lack formal convergence guarantees due to parameter truncation and biased gradient updates. Specifically, adapting client-specific LoRA ranks necessitates truncating global parameters, which introduces inherent truncation errors and leads to subsequent inaccurate gradient updates that accumulate over training rounds, ultimately degrading performance. To address the above issues, we propose \textbf{FedHL}, a simple yet effective \textbf{Fed}erated Learning framework tailored for \textbf{H}eterogeneous \textbf{L}oRA. By leveraging the full-rank global model as a calibrated aggregation basis, FedHL eliminates the direct truncation bias from initial alignment with client-specific ranks. Furthermore, we derive the theoretically optimal aggregation weights by minimizing the gradient drift term in the convergence upper bound. Our analysis shows that FedHL guarantees convergence rate, and experiments on multiple real-world datasets demonstrate a 1-3\% improvement over several state-of-the-art methods.

Paper Structure

This paper contains 25 sections, 51 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: Heterogeneous client environments necessitate LoRA modules of varying ranks, requiring the design of targeted federated aggregation algorithms.
  • Figure 2: Overview of FedHL. In heterogeneous LoRA, compared to ideal full-rank updates, truncation and gradient biases emerge, as illustrated in the upper-right panel. During server-side aggregation, a closer unbiased aggregation baseline $W_t$ is introduced to correct init truncation biases, and weights $p_i$ are dynamically adjusted to minimize training bias.
  • Figure 3: Training Loss for LoRA-Based Configurations under Moderate and High Heterogeneity (Fed-GSM8K).
  • Figure 4: Training Loss for LoRA-Based Configurations under Moderate and High Heterogeneity (Fed-CodeAlpaca).
  • Figure 5: Comparison of Client Model Scores Across Different LoRA Ranks in a Cross-Device Setting. Left: Fed-GSM8K. Right: Fed-CodeAlpaca.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Remark 6
  • proof
  • proof
  • proof
  • proof
  • ...and 1 more