Federated LoRA with Sparse Communication
Kevin Kuo, Arian Raje, Kousik Rajesh, Virginia Smith
TL;DR
This paper tackles the high communication cost of fine-tuning large pretrained models in cross-device federated learning by focusing on LoRA as a parameter-efficient fine-tuning method. It introduces FLASC, which communicates sparse LoRA updates while allowing dense local fine-tuning, and supports separate sparsity patterns for download and upload. Empirically, FLASC matches dense LoRA performance across multiple tasks while achieving up to $10\times$ reduction in communication and showing robustness to data heterogeneity and privacy concerns; it outperforms existing pruning-based baselines, highlighting the importance of aligning sparsity with FL constraints. The work positions FLASC as a strong, simple baseline for future federated fine-tuning efforts and motivates further exploration of system-aware efficiency strategies and scalability to larger models.
Abstract
Low-rank adaptation (LoRA) is a natural method for finetuning in communication-constrained machine learning settings such as cross-device federated learning. Prior work that has studied LoRA in the context of federated learning has focused on improving LoRA's robustness to heterogeneity and privacy. In this work, we instead consider techniques for further improving communication-efficiency in federated LoRA. Unfortunately, we show that centralized ML methods that improve the efficiency of LoRA through unstructured pruning do not transfer well to federated settings. We instead study a simple approach, \textbf{FLASC}, that applies sparsity to LoRA during communication while allowing clients to locally fine-tune the entire LoRA module. Across four common federated learning tasks, we demonstrate that this method matches the performance of dense LoRA with up to $10\times$ less communication. Additionally, despite being designed primarily to target communication, we find that this approach has benefits in terms of heterogeneity and privacy relative to existing approaches tailored to these specific concerns. Overall, our work highlights the importance of considering system-specific constraints when developing communication-efficient finetuning approaches, and serves as a simple and competitive baseline for future work in federated finetuning.
