Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

Jiamu Bai; Daoyuan Chen; Bingchen Qian; Liuyi Yao; Yaliang Li

Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

Jiamu Bai, Daoyuan Chen, Bingchen Qian, Liuyi Yao, Yaliang Li

TL;DR

Federated fine-tuning of billion-scale LLMs is hampered by data and resource heterogeneity across clients, leading to a bucket effect where resource-rich clients underutilize their capacity. FlexLoRA offers a simple, plug-in aggregation mechanism that allows clients to contribute LoRA weights at heterogeneous ranks, averaging them into a full-size representation and redistributing via SVD to preserve each client’s resource constraints. The authors provide a theoretical generalization bound under Lipschitz and SVD-approximation assumptions and validate the approach with extensive cross-device experiments across thousands of NLP tasks, showing consistent improvements over state-of-the-art FL baselines in zero-shot and cross-task settings. The work demonstrates practical, privacy-preserving federated tuning for LLMs and highlights scalability and applicability to edge devices, with open-source code and broad compatibility with existing LoRA-based FL methods. Overall, FlexLoRA advances resource-aware FL by enabling broader knowledge transfer across heterogeneous client populations.

Abstract

Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs). While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients. This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the ``bucket effect'' in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. FlexLoRA allows for dynamic adjustment of local LoRA ranks, fostering the development of a global model imbued with broader, less task-specific knowledge. By synthesizing a full-size LoRA weight from individual client contributions and employing Singular Value Decomposition (SVD) for weight redistribution, FlexLoRA fully leverages heterogeneous client resources. Involving thousands of clients performing heterogeneous NLP tasks and client resources, our experiments validate the efficacy of FlexLoRA, with the federated global model achieving consistently better improvement over SOTA FL methods in downstream NLP task performance across various heterogeneous distributions. FlexLoRA's practicality is further underscored by our theoretical analysis and its seamless integration with existing LoRA-based FL methods, offering a path toward cross-device, privacy-preserving federated tuning for LLMs.

Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

TL;DR

Abstract

Paper Structure (29 sections, 1 theorem, 3 equations, 9 figures, 13 tables, 2 algorithms)

This paper contains 29 sections, 1 theorem, 3 equations, 9 figures, 13 tables, 2 algorithms.

Introduction
Related Work
Methodology of FlexLoRA
Intrinsic Dimension and Generalization
Aggregation with Heterogeneous Ranks
Maximizing Local Rank with Local Resources
Generalization Analysis
Experiments
Setup for Cross-Device FL Environments
Setup for FL Baselines
Unseen Client Generalization
Cross-Task Generalization
Aggregation Scheme Study
Scalability Study
Conclusion
...and 14 more sections

Key Result

Theorem 1

Under Assumptions assump:lipschitz and assump:bound-error, with probability at least $1-\delta$, there exists a sample size $\widetilde{N} =\mathcal{O}( \frac{k}{{|\mathcal{C}|}\epsilon^2}\log (\frac{RL_fL_h}{\epsilon-2\phi^i L_fL_h}) - \frac{\log \delta}{{|\mathcal{C}|}\epsilon^2} )$ such that for

Figures (9)

Figure 1: Test loss of FlexLoRA and FedIT FedIT across communication rounds under LoRA ranks of 1, 8, and 200. FlexLoRA demonstrates adaptability in an "extreme heavy tail" scenario and increasingly aligns with the performance of FedIT at the highest LoRA rank as rounds progress. Implementation details are in Appendix \ref{['empirical_detail']}.
Figure 2: Illustration of FlexLoRA. The server initially constructs a full-size LoRA weight, which is then averaged across client-contributed weights with different ranks. The aggregated global weights are decoupled via SVD and sent back to clients.
Figure 3: The LoRA configurations that compose heterogeneous resource distributions, detailed in Figure \ref{['fig:heteroResource']}.
Figure 4: Task-specific improvements achieved by FlexLoRA in comparison with the homogeneous rank implementation of FedAvg, across different resource distribution settings.
Figure 5: Average percentage improvement of FlexLoRA over baseline methods (FedAvg, FedIT, SLoRA) across different resource distributions, calculated over 12 NLP task categories. More detailed comparison is presented in Figure \ref{['fig:nlp_performance_diff_fedavg']}.
...and 4 more figures

Theorems & Definitions (1)

Theorem 1

Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

TL;DR

Abstract

Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (1)