Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Viswanath Ganapathy, Rafael Esteves, Shreya Kadambi, Shubhankar Borse, Paul Whatmough, Risheek Garrepalli, Mart Van Baalen, Harris Teague, Markus Nagel
TL;DR
Sparse High Rank Adapters (SHiRA) address edge-deployment limitations of LoRA by finetuning only about $1\%$-$2\%$ of base weights using extreme sparsity, enabling rapid on-device adapter switching and reduced cross-adapter interference. SHiRA uses gradient-masked training with multiple mask families to realize high-rank sparse adapters without adding forward parameters, and supports rapid inference via a scatter_op weight overwrite rather than full fusion. Across vision and language tasks (including LLaMA, LLaMA2, and Stable Diffusion), SHiRA outperforms LoRA on single and multi-adapter setups, with notable gains such as up to $2.7\%$ higher commonsense accuracy on LLMs and an average $6.69\%$ improvement in multi-adapter fusion on LLaMA2-7B, while also reducing peak GPU memory by about $16.63\%$ and enabling up to $10\times$ faster CPU weight overwrites. The method is complementary to advanced LoRA variants like DoRA, exhibits orthogonality in fusion behavior, and provides a practical path to edge-friendly, low-overhead PEFT with robust adaptability. These contributions advance efficient on-device fine-tuning, rapid switching, and reliable multi-concept fusion for large-scale vision-language models.
Abstract
In this paper, we propose Sparse High Rank Adapters (SHiRA) that directly finetune 1-2% of the base model weights while leaving others unchanged, thus, resulting in a highly sparse adapter. This high sparsity incurs no inference overhead, enables rapid switching directly in the fused mode, and significantly reduces concept-loss during multi-adapter fusion. Our extensive experiments on LVMs and LLMs demonstrate that finetuning merely 1-2% parameters in the base model is sufficient for many adapter tasks and significantly outperforms Low Rank Adaptation (LoRA). We also show that SHiRA is orthogonal to advanced LoRA methods such as DoRA and can be easily combined with existing techniques.
