Heterogeneous LoRA for Federated Fine-tuning of On-Device Foundation Models
Yae Jee Cho, Luyang Liu, Zheng Xu, Aldi Fahrezi, Gauri Joshi
TL;DR
This work tackles privacy-preserving federated fine-tuning of on-device foundation models by introducing HetLoRA, which permits heterogeneous LoRA ranks across clients and uses rank self-pruning and sparsity-weighted aggregation to combine updates. By formalizing the FL objective with variable per-client ranks and implementing a three-step HetLoRA workflow—distribution via truncation, local rank-pruning, and weighted aggregation—the approach balances convergence speed and generalization while dramatically reducing trainable parameters compared to full fine-tuning. Empirical results on PaLM 2 XXS/XS with Reddit RougeL and multi-session chat perplexity show HetLoRA outperforming homogeneous-LoRA baselines and reconstruction-based methods, while achieving near-full fine-tuning performance at a fraction of the communication and computation cost. The proposed method demonstrates practical potential for on-device privacy-preserving adaptation of small-to-mid sized foundation models in heterogeneous device ecosystems, and sets the stage for theoretical convergence and rank-assignment strategies in future work.
Abstract
Foundation models (FMs) adapt well to specific domains or tasks with fine-tuning, and federated learning (FL) enables the potential for privacy-preserving fine-tuning of the FMs with on-device local data. For federated fine-tuning of FMs, we consider the FMs with small to medium parameter sizes of single digit billion at maximum, referred to as on-device FMs (ODFMs) that can be deployed on devices for inference but can only be fine-tuned with parameter efficient methods. In our work, we tackle the data and system heterogeneity problem of federated fine-tuning of ODFMs by proposing a novel method using heterogeneous low-rank approximations (LoRAs), namely HetLoRA. First, we show that the naive approach of using homogeneous LoRA ranks across devices face a trade-off between overfitting and slow convergence, and thus propose HetLoRA, which allows heterogeneous ranks across client devices and efficiently aggregates and distributes these heterogeneous LoRA modules. By applying rank self-pruning locally and sparsity-weighted aggregation at the server, HetLoRA combines the advantages of high and low-rank LoRAs, which achieves improved convergence speed and final performance compared to homogeneous LoRA. Furthermore, HetLoRA offers enhanced computation efficiency compared to full fine-tuning, making it suitable for federated fine-tuning across heterogeneous devices.
