Table of Contents
Fetching ...

Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning

Arian Raje, Baris Askin, Divyansh Jhunjhunwala, Gauri Joshi

TL;DR

Ravan tackles federated fine-tuning of LLMs under data and computational heterogeneity by introducing an adaptive multi-head LoRA framework. By reparameterizing updates as a sum of $h$ heads $s_i\mathbf{B}_i\mathbf{H}_i\mathbf{A}_i$ with frozen bases and trainable $\mathbf{H}_i$ and $s_i$, it increases the effective update rank while preserving exact aggregation and same communication cost. Across vision and language benchmarks, Ravan consistently outperforms prior PEFT baselines, with larger gains in non-IID settings and scalability to larger models like LLaMA-based GLUE tasks. This approach enables robust, edge-efficient fine-tuning of LLMs using on-device data and heterogeneous hardware. The work also provides thorough ablations on initialization, head selection, and scaling factors, outlining practical guidelines for deploying Ravan in cross-device FL scenarios.

Abstract

Large language models (LLMs) have not yet effectively leveraged the vast amounts of edge-device data, and federated learning (FL) offers a promising paradigm to collaboratively fine-tune LLMs without transferring private edge data to the cloud. To operate within the computation and communication constraints of edge devices, recent literature on federated fine-tuning of LLMs proposes the use of low-rank adaptation (LoRA) and similar parameter-efficient methods. However, LoRA-based methods suffer from accuracy degradation in FL settings, primarily because of data and computational heterogeneity across clients. We propose Ravan, an adaptive multi-head LoRA method that balances parameter efficiency and model expressivity by reparameterizing the weight updates as the sum of multiple LoRA heads $s_i\textbf{B}_i\textbf{H}_i\textbf{A}_i$ in which only the core matrices $\textbf{H}_i$ and their lightweight scaling factors $s_i$ are trained. These trainable scaling factors let the optimization focus on the most useful heads, recovering a higher-rank approximation of the full update without increasing the number of communicated parameters since clients upload $s_i\textbf{H}_i$ directly. Experiments on vision and language benchmarks show that Ravan improves test accuracy by $2-8\%$ over prior parameter-efficient baselines, making it a robust and scalable solution for federated fine-tuning of LLMs.

Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning

TL;DR

Ravan tackles federated fine-tuning of LLMs under data and computational heterogeneity by introducing an adaptive multi-head LoRA framework. By reparameterizing updates as a sum of heads with frozen bases and trainable and , it increases the effective update rank while preserving exact aggregation and same communication cost. Across vision and language benchmarks, Ravan consistently outperforms prior PEFT baselines, with larger gains in non-IID settings and scalability to larger models like LLaMA-based GLUE tasks. This approach enables robust, edge-efficient fine-tuning of LLMs using on-device data and heterogeneous hardware. The work also provides thorough ablations on initialization, head selection, and scaling factors, outlining practical guidelines for deploying Ravan in cross-device FL scenarios.

Abstract

Large language models (LLMs) have not yet effectively leveraged the vast amounts of edge-device data, and federated learning (FL) offers a promising paradigm to collaboratively fine-tune LLMs without transferring private edge data to the cloud. To operate within the computation and communication constraints of edge devices, recent literature on federated fine-tuning of LLMs proposes the use of low-rank adaptation (LoRA) and similar parameter-efficient methods. However, LoRA-based methods suffer from accuracy degradation in FL settings, primarily because of data and computational heterogeneity across clients. We propose Ravan, an adaptive multi-head LoRA method that balances parameter efficiency and model expressivity by reparameterizing the weight updates as the sum of multiple LoRA heads in which only the core matrices and their lightweight scaling factors are trained. These trainable scaling factors let the optimization focus on the most useful heads, recovering a higher-rank approximation of the full update without increasing the number of communicated parameters since clients upload directly. Experiments on vision and language benchmarks show that Ravan improves test accuracy by over prior parameter-efficient baselines, making it a robust and scalable solution for federated fine-tuning of LLMs.

Paper Structure

This paper contains 38 sections, 8 equations, 8 figures, 14 tables, 1 algorithm.

Figures (8)

  • Figure 1: Singular value spectra of the weight updates $\Delta \textbf{W}$ for CIFAR-100 and SVHN 37648 in three different training regimes. We display only the 64 largest values (hence the truncated plots). Moving from centralized learning $\rightarrow$ FL (I.I.D. clients) $\rightarrow$ FL (non-I.I.D. clients), the median shifts up and the distribution becomes broader, meaning a larger fraction of singular values remains near the higher end of the spectrum. The effective rank is, therefore, higher in the federated, non-I.I.D. setting.
  • Figure 2: Left: Within the same parameter count, the effective rank of the LoRA parameters increases when using an augmented third parameter and multiple heads. Right: Clients with various computational constraints can freeze certain heads to reduce memory consumption.
  • Figure 3: Clients draw trainable parameter budget from bell-shaped, uniform, or skewed right distributions. All Ravan variants outperform the baselines in every distribution.
  • Figure 4: Comparison of performance when using different numbers of Ravan heads at two different parameter budgets (SVHN: $N_{\text{total}}$ = 1.2 M/2.4 M, 20 Newsgroups: $N_{\text{total}}$ = 2.4 M/4.7 M).
  • Figure 5: Fraction of clients assigned to each trainable parameter budget in each distribution. Skewed‑left is omitted because it is never used in our experiments.
  • ...and 3 more figures