Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning
Arian Raje, Baris Askin, Divyansh Jhunjhunwala, Gauri Joshi
TL;DR
Ravan tackles federated fine-tuning of LLMs under data and computational heterogeneity by introducing an adaptive multi-head LoRA framework. By reparameterizing updates as a sum of $h$ heads $s_i\mathbf{B}_i\mathbf{H}_i\mathbf{A}_i$ with frozen bases and trainable $\mathbf{H}_i$ and $s_i$, it increases the effective update rank while preserving exact aggregation and same communication cost. Across vision and language benchmarks, Ravan consistently outperforms prior PEFT baselines, with larger gains in non-IID settings and scalability to larger models like LLaMA-based GLUE tasks. This approach enables robust, edge-efficient fine-tuning of LLMs using on-device data and heterogeneous hardware. The work also provides thorough ablations on initialization, head selection, and scaling factors, outlining practical guidelines for deploying Ravan in cross-device FL scenarios.
Abstract
Large language models (LLMs) have not yet effectively leveraged the vast amounts of edge-device data, and federated learning (FL) offers a promising paradigm to collaboratively fine-tune LLMs without transferring private edge data to the cloud. To operate within the computation and communication constraints of edge devices, recent literature on federated fine-tuning of LLMs proposes the use of low-rank adaptation (LoRA) and similar parameter-efficient methods. However, LoRA-based methods suffer from accuracy degradation in FL settings, primarily because of data and computational heterogeneity across clients. We propose Ravan, an adaptive multi-head LoRA method that balances parameter efficiency and model expressivity by reparameterizing the weight updates as the sum of multiple LoRA heads $s_i\textbf{B}_i\textbf{H}_i\textbf{A}_i$ in which only the core matrices $\textbf{H}_i$ and their lightweight scaling factors $s_i$ are trained. These trainable scaling factors let the optimization focus on the most useful heads, recovering a higher-rank approximation of the full update without increasing the number of communicated parameters since clients upload $s_i\textbf{H}_i$ directly. Experiments on vision and language benchmarks show that Ravan improves test accuracy by $2-8\%$ over prior parameter-efficient baselines, making it a robust and scalable solution for federated fine-tuning of LLMs.
