pFedLoRA: Model-Heterogeneous Personalized Federated Learning with LoRA Tuning
Liping Yi, Han Yu, Gang Wang, Xiaoguang Liu, Xiaoxiao Li
TL;DR
This paper tackles model heterogeneity in federated learning by proposing FedLoRA, which inserts a small, homogeneous, low-rank adapter into heterogeneous local models to enable cross-client knowledge transfer. An iterative learning scheme trains the heterogeneous models and adapters in alternating phases, followed by FedAvg-style aggregation of adapters to form a global adapter. The authors prove a non-convex convergence rate of $O(1/T)$ and demonstrate that FedLoRA achieves higher accuracy with far lower computation ($O(11.81 imes$) and communication ($O(7.41 imes$) savings) than baselines on CIFAR-10/100 under non-IID conditions. The approach enables privacy-preserving, communication-efficient, model-heterogeneous personalized FL with robust performance across varying data distributions and client participation.
Abstract
Federated learning (FL) is an emerging machine learning paradigm in which a central server coordinates multiple participants (clients) collaboratively to train on decentralized data. In practice, FL often faces statistical, system, and model heterogeneities, which inspires the field of Model-Heterogeneous Personalized Federated Learning (MHPFL). With the increased interest in adopting large language models (LLMs) in FL, the existing MHPFL methods cannot achieve acceptable computational and communication costs, while maintaining satisfactory model performance. To bridge this gap, we propose a novel and efficient model-heterogeneous personalized Federated learning framework based on LoRA tuning (pFedLoRA). Inspired by the popular LoRA method for fine-tuning pre-trained LLMs with a low-rank model (a.k.a., an adapter), we design a homogeneous small adapter to facilitate federated client's heterogeneous local model training with our proposed iterative training for global-local knowledge exchange. The homogeneous small local adapters are aggregated on the FL server to generate a global adapter. We theoretically prove the convergence of pFedLoRA. Extensive experiments on two benchmark datasets demonstrate that pFedLoRA outperforms six state-of-the-art baselines, beating the best method by 1.35% in test accuracy, 11.81 times computation overhead reduction and 7.41 times communication cost saving.
