FedHL: Federated Learning for Heterogeneous Low-Rank Adaptation via Unbiased Aggregation
Zihao Peng, Jiandian Zeng, Boyuan Li, Guo Li, Shengbo Chen, Tian Wang
TL;DR
This work identifies truncation bias and gradient drift as critical obstacles to convergence when applying heterogeneous LoRA in federated fine-tuning. It introduces FedHL, a framework that uses the full-rank global model as a baseline for unbiased aggregation and derives optimal, round-specific aggregation weights to minimize gradient drift, achieving a theoretical convergence rate of $ ext{O}(1/ oot 2 ext{T})$. Empirically, FedHL yields 1–3% improvements over state-of-the-art methods across cross-silo and cross-device settings and demonstrates robust performance under varied LoRA ranks and participation. The results offer a principled approach to resilient federated fine-tuning of foundation models with heterogeneous LoRA configurations, with potential impact on privacy-preserving, communication-efficient large-scale NLP and multimodal learning.
Abstract
Federated Learning (FL) facilitates the fine-tuning of Foundation Models (FMs) using distributed data sources, with Low-Rank Adaptation (LoRA) gaining popularity due to its low communication costs and strong performance. While recent work acknowledges the benefits of heterogeneous LoRA in FL and introduces flexible algorithms to support its implementation, our theoretical analysis reveals a critical gap: existing methods lack formal convergence guarantees due to parameter truncation and biased gradient updates. Specifically, adapting client-specific LoRA ranks necessitates truncating global parameters, which introduces inherent truncation errors and leads to subsequent inaccurate gradient updates that accumulate over training rounds, ultimately degrading performance. To address the above issues, we propose \textbf{FedHL}, a simple yet effective \textbf{Fed}erated Learning framework tailored for \textbf{H}eterogeneous \textbf{L}oRA. By leveraging the full-rank global model as a calibrated aggregation basis, FedHL eliminates the direct truncation bias from initial alignment with client-specific ranks. Furthermore, we derive the theoretically optimal aggregation weights by minimizing the gradient drift term in the convergence upper bound. Our analysis shows that FedHL guarantees $\mathcal{O}(1/\sqrt{T})$ convergence rate, and experiments on multiple real-world datasets demonstrate a 1-3\% improvement over several state-of-the-art methods.
