Table of Contents
Fetching ...

Federated LoRA with Sparse Communication

Kevin Kuo, Arian Raje, Kousik Rajesh, Virginia Smith

TL;DR

This paper tackles the high communication cost of fine-tuning large pretrained models in cross-device federated learning by focusing on LoRA as a parameter-efficient fine-tuning method. It introduces FLASC, which communicates sparse LoRA updates while allowing dense local fine-tuning, and supports separate sparsity patterns for download and upload. Empirically, FLASC matches dense LoRA performance across multiple tasks while achieving up to $10\times$ reduction in communication and showing robustness to data heterogeneity and privacy concerns; it outperforms existing pruning-based baselines, highlighting the importance of aligning sparsity with FL constraints. The work positions FLASC as a strong, simple baseline for future federated fine-tuning efforts and motivates further exploration of system-aware efficiency strategies and scalability to larger models.

Abstract

Low-rank adaptation (LoRA) is a natural method for finetuning in communication-constrained machine learning settings such as cross-device federated learning. Prior work that has studied LoRA in the context of federated learning has focused on improving LoRA's robustness to heterogeneity and privacy. In this work, we instead consider techniques for further improving communication-efficiency in federated LoRA. Unfortunately, we show that centralized ML methods that improve the efficiency of LoRA through unstructured pruning do not transfer well to federated settings. We instead study a simple approach, \textbf{FLASC}, that applies sparsity to LoRA during communication while allowing clients to locally fine-tune the entire LoRA module. Across four common federated learning tasks, we demonstrate that this method matches the performance of dense LoRA with up to $10\times$ less communication. Additionally, despite being designed primarily to target communication, we find that this approach has benefits in terms of heterogeneity and privacy relative to existing approaches tailored to these specific concerns. Overall, our work highlights the importance of considering system-specific constraints when developing communication-efficient finetuning approaches, and serves as a simple and competitive baseline for future work in federated finetuning.

Federated LoRA with Sparse Communication

TL;DR

This paper tackles the high communication cost of fine-tuning large pretrained models in cross-device federated learning by focusing on LoRA as a parameter-efficient fine-tuning method. It introduces FLASC, which communicates sparse LoRA updates while allowing dense local fine-tuning, and supports separate sparsity patterns for download and upload. Empirically, FLASC matches dense LoRA performance across multiple tasks while achieving up to reduction in communication and showing robustness to data heterogeneity and privacy concerns; it outperforms existing pruning-based baselines, highlighting the importance of aligning sparsity with FL constraints. The work positions FLASC as a strong, simple baseline for future federated fine-tuning efforts and motivates further exploration of system-aware efficiency strategies and scalability to larger models.

Abstract

Low-rank adaptation (LoRA) is a natural method for finetuning in communication-constrained machine learning settings such as cross-device federated learning. Prior work that has studied LoRA in the context of federated learning has focused on improving LoRA's robustness to heterogeneity and privacy. In this work, we instead consider techniques for further improving communication-efficiency in federated LoRA. Unfortunately, we show that centralized ML methods that improve the efficiency of LoRA through unstructured pruning do not transfer well to federated settings. We instead study a simple approach, \textbf{FLASC}, that applies sparsity to LoRA during communication while allowing clients to locally fine-tune the entire LoRA module. Across four common federated learning tasks, we demonstrate that this method matches the performance of dense LoRA with up to less communication. Additionally, despite being designed primarily to target communication, we find that this approach has benefits in terms of heterogeneity and privacy relative to existing approaches tailored to these specific concerns. Overall, our work highlights the importance of considering system-specific constraints when developing communication-efficient finetuning approaches, and serves as a simple and competitive baseline for future work in federated finetuning.
Paper Structure (17 sections, 8 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: A step-by-step overview of FLASC. Step 0 is executed prior to FL training, while training repeats steps 1-6. Blue/red squares indicate the magnitude of weights/updates respectively. Darker squares indicate a larger magnitude, which is the ranking criterion $(\ell_1)$ used for sparsity.
  • Figure 2: We compare utility ($\uparrow$) vs. total communication when augmenting LoRA (rank $r=16$) with sparsity. Out of all four methods, FLASC reaches the highest utility with the least communication. In contrast, Adapter LTH is inefficient early in training and SparseAdapter fails to match the utility of LoRA. Shaded bands show the min/mean/max utility over 3 random seeds.
  • Figure 3: We measure the communication time ($\downarrow$) needed to reach 70% accuracy on 20NewsGroups. Beyond efficiency in terms of total communication $(1\times)$, FLASC (green) is robust to extremely slow upload speed $(16\times)$ by making upload more sparse than download. Hatched bars indicate that SparseAdapter failed to reach 70% accuracy.
  • Figure 4: We compare the accuracy $(\uparrow)$ of FLASC to two ways of freezing weights while training an unstructured sparse ${\textsc{LoRA}}~(r=16)$ module with FedAdam.
  • Figure 5: We show accuracy ($\uparrow$) in settings with varying label heterogeneity. We reduce communication using a) lower LoRA rank or b) sparsity with FLASC. Bars are grouped by communication cost and ordered by increasing heterogeneity (decreasing $\alpha$). We find that tuning the rank is important; FLASC with $r=16$ can outperform full finetuning and smaller ranks with similar communication.
  • ...and 3 more figures