Table of Contents
Fetching ...

HLoRA: Efficient Federated Learning System for LLM Heterogeneous Fine-Tuning

Qianli Liu, Zhaorui Zhang, Xin Yao, Benben Liu

TL;DR

HLoRA addresses the challenge of fine-tuning large language models in federated settings with heterogeneous client resources by enabling per-client rank diversity for LoRA adapters. It achieves this by reconstructing the aggregated weight updates on the server as $W' = sum_{k=1}^K (n_k/n) (B_k A_k)$ and then applying an SVD-based decomposition to assign client-specific ranks, mitigating bias and improving convergence. Empirical results on MRPC, QQP, and RTE using RoBERTa-large within the Plato framework show faster convergence and higher final accuracy than naive or homogeneous-LoRA baselines, highlighting the method’s practical potential for privacy-preserving, resource-diverse deployments. The work advances federated PEFT by providing a principled, scalable approach to rank heterogeneity, with implications for real-world multi-institution collaborations under data privacy constraints.

Abstract

Federated learning systems have been identified as an efficient approach to scaling distributed model training with a large amount of participants or data owners while guaranteeing data privacy. To apply the current most popular pre-trained large language models to other domains with data privacy guarantee requirements, existing works propose fine-tuning the pre-trained large language models in federated learning environments across data owners using the parameter efficient fine-tuning approaches, LoRA. To address the resource and data heterogeneous issues for the participants, previous works adopted heterogeneous LoRA using different ranks for different clients and pending their rank, which brings bias for the parameter aggregation. To address this issue, we propose HLoRA, an efficient federated learning system utilizing a modified LoRA approach that incorporates rank heterogeneity to optimize communication and computational efficiency. Experimental results, conducted using the Microsoft Research Paraphrase Corpus (MRPC), Quora Question Pairs (QQP) and Recognizing Textual Entailment (RTE), within the Plato federated learning framework, demonstrate that our method not only reduces resource demands but also outperforms traditional LoRA applications in terms of convergence speed and final model accuracy. This study shows that our approach can significantly improve the practical deployment of federated LLM fine-tuning, particularly in environments with diverse client resources.

HLoRA: Efficient Federated Learning System for LLM Heterogeneous Fine-Tuning

TL;DR

HLoRA addresses the challenge of fine-tuning large language models in federated settings with heterogeneous client resources by enabling per-client rank diversity for LoRA adapters. It achieves this by reconstructing the aggregated weight updates on the server as and then applying an SVD-based decomposition to assign client-specific ranks, mitigating bias and improving convergence. Empirical results on MRPC, QQP, and RTE using RoBERTa-large within the Plato framework show faster convergence and higher final accuracy than naive or homogeneous-LoRA baselines, highlighting the method’s practical potential for privacy-preserving, resource-diverse deployments. The work advances federated PEFT by providing a principled, scalable approach to rank heterogeneity, with implications for real-world multi-institution collaborations under data privacy constraints.

Abstract

Federated learning systems have been identified as an efficient approach to scaling distributed model training with a large amount of participants or data owners while guaranteeing data privacy. To apply the current most popular pre-trained large language models to other domains with data privacy guarantee requirements, existing works propose fine-tuning the pre-trained large language models in federated learning environments across data owners using the parameter efficient fine-tuning approaches, LoRA. To address the resource and data heterogeneous issues for the participants, previous works adopted heterogeneous LoRA using different ranks for different clients and pending their rank, which brings bias for the parameter aggregation. To address this issue, we propose HLoRA, an efficient federated learning system utilizing a modified LoRA approach that incorporates rank heterogeneity to optimize communication and computational efficiency. Experimental results, conducted using the Microsoft Research Paraphrase Corpus (MRPC), Quora Question Pairs (QQP) and Recognizing Textual Entailment (RTE), within the Plato federated learning framework, demonstrate that our method not only reduces resource demands but also outperforms traditional LoRA applications in terms of convergence speed and final model accuracy. This study shows that our approach can significantly improve the practical deployment of federated LLM fine-tuning, particularly in environments with diverse client resources.

Paper Structure

This paper contains 22 sections, 3 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: For illustration, consider a consortium of three hospitals aiming to develop an LLM for medical diagnostics. Each entity's data, while valuable, is insufficient in isolation. United, their data could revolutionize medical LLM training. Alas, stringent data privacy laws often thwart such synergistic endeavors, exacerbating the dual challenges of data paucity and privacy preservation.
  • Figure 2: Compared to the direct application of LoRa, our design reconstructs the weight matrix to achieve the optimal effect of aggregating the weights, and at the same time can aggregate the heterogeneous rank between clients
  • Figure 3: Comparative Performance Analysis of Federated LoRA Implementations. Sub-figure (a) shows the convergence speed and final performance of the naive implementation versus the reconstructed matrix re-decomposition with rank homogeneity, demonstrating faster convergence and higher ultimate performance in the latter. Sub-figure (b) compares the performance of reconstructed matrix re-decomposition with rank homogeneity against rank isomorphism, highlighting that while rank isomorphism converges more slowly, it achieves superior long-term accuracy. These comparisons underscore the impact of rank configuration on the efficacy of federated learning adaptations.