Table of Contents
Fetching ...

Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning

Raghav Singhal, Kaustubh Ponkshe, Rohit Vartak, Lav R. Varshney, Praneeth Vepakomma

TL;DR

The paper tackles the inefficiencies of federated fine-tuning with LoRA, where inexact aggregation and privacy-induced noise hinder performance and scalability. It proposes Fed-SB, a federated adaptation of LoRA-SB that distributes frozen adapters $A$ and $B$ while clients train a small $R$ matrix, enabling exact aggregation via simple averaging: $R^{agg} = \frac{1}{c} \sum_i R_i$ and $\Delta W^{agg} = B R^{agg} A$. This yields a communication cost independent of the number of clients and improves DP privacy due to fewer learnable parameters, while attaining state-of-the-art results on multiple benchmarks and models, including non-private and privacy-preserving federated fine-tuning. The approach also supports rank-heterogeneous clients and demonstrates substantial memory and time efficiency, making Fed-SB a scalable, privacy-friendly solution for federated fine-tuning of large language models with LoRA-based adapters.

Abstract

Low-Rank Adaptation (LoRA) has become ubiquitous for efficiently fine-tuning foundation models. However, federated fine-tuning using LoRA is challenging due to suboptimal updates arising from traditional federated averaging of individual adapters. Existing solutions either incur prohibitively high communication cost that scales linearly with the number of clients or suffer from performance degradation due to limited expressivity. We introduce Federated Silver Bullet (Fed-SB), a novel approach for federated fine-tuning of LLMs using LoRA-SB, a recently proposed low-rank adaptation method. LoRA-SB optimally aligns the optimization trajectory with the ideal low-rank full fine-tuning projection by learning a small square matrix (R) between adapters B and A, keeping other components fixed. Direct averaging of R guarantees exact updates, substantially reducing communication cost, which remains independent of the number of clients, and enables scalability. Fed-SB achieves state-of-the-art performance across commonsense reasoning, arithmetic reasoning, and language inference tasks while reducing communication costs by up to 230x. In private settings, Fed-SB further improves performance by (1) reducing trainable parameters, thereby lowering the noise required for differential privacy and (2) avoiding noise amplification introduced by other methods. Overall, Fed-SB offers a state-of-the-art, efficient, and scalable solution for both private and non-private federated fine-tuning. Our code is publicly available at: https://github.com/CERT-Lab/fed-sb.

Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning

TL;DR

The paper tackles the inefficiencies of federated fine-tuning with LoRA, where inexact aggregation and privacy-induced noise hinder performance and scalability. It proposes Fed-SB, a federated adaptation of LoRA-SB that distributes frozen adapters and while clients train a small matrix, enabling exact aggregation via simple averaging: and . This yields a communication cost independent of the number of clients and improves DP privacy due to fewer learnable parameters, while attaining state-of-the-art results on multiple benchmarks and models, including non-private and privacy-preserving federated fine-tuning. The approach also supports rank-heterogeneous clients and demonstrates substantial memory and time efficiency, making Fed-SB a scalable, privacy-friendly solution for federated fine-tuning of large language models with LoRA-based adapters.

Abstract

Low-Rank Adaptation (LoRA) has become ubiquitous for efficiently fine-tuning foundation models. However, federated fine-tuning using LoRA is challenging due to suboptimal updates arising from traditional federated averaging of individual adapters. Existing solutions either incur prohibitively high communication cost that scales linearly with the number of clients or suffer from performance degradation due to limited expressivity. We introduce Federated Silver Bullet (Fed-SB), a novel approach for federated fine-tuning of LLMs using LoRA-SB, a recently proposed low-rank adaptation method. LoRA-SB optimally aligns the optimization trajectory with the ideal low-rank full fine-tuning projection by learning a small square matrix (R) between adapters B and A, keeping other components fixed. Direct averaging of R guarantees exact updates, substantially reducing communication cost, which remains independent of the number of clients, and enables scalability. Fed-SB achieves state-of-the-art performance across commonsense reasoning, arithmetic reasoning, and language inference tasks while reducing communication costs by up to 230x. In private settings, Fed-SB further improves performance by (1) reducing trainable parameters, thereby lowering the noise required for differential privacy and (2) avoiding noise amplification introduced by other methods. Overall, Fed-SB offers a state-of-the-art, efficient, and scalable solution for both private and non-private federated fine-tuning. Our code is publicly available at: https://github.com/CERT-Lab/fed-sb.

Paper Structure

This paper contains 21 sections, 3 theorems, 22 equations, 6 figures, 13 tables.

Key Result

Lemma 1

Consider a model with $d$ learnable parameters trained using DP-SGD. The privacy parameter $\epsilon$ for $\delta$-approximate differential privacy, given $T$ training steps and a batch size of $q$, is expressed as:

Figures (6)

  • Figure 1: Performance vs. communicated parameter cost (log scale) for Fed-SB and other federated fine-tuning methods in both non-private and privacy-preserving federated settings. Fed-SB advances the performance-communication cost Pareto frontier across all models and tasks, achieving state-of-the-art accuracy while significantly reducing communication cost. Communicated parameters are in thousands for BERT and millions for other models.
  • Figure 2: Fed-SB: Our method achieves optimal exact aggregation by averaging only the $r\times r$ matrices $\mathbf{R}_i$, significantly reducing communication costs.
  • Figure 3: Performance vs. number of communicated parameters (in log scale) for various methods in federated fine-tuning across multiple models on arithmetic and commonsense reasoning tasks.
  • Figure 4: Performance comparison of various methods in centralized (Cent.) private and federated private fine-tuning (BERT-base) on SNLI across varying values of $\epsilon$.
  • Figure 5: Performance vs. number of trainable parameters (in log scale) for various methods in centralized private fine-tuning (BERT-base) across different privacy budgets ($\epsilon$).
  • ...and 1 more figures

Theorems & Definitions (5)

  • Lemma 1
  • proof
  • Lemma
  • proof
  • Theorem