MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning
Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Jiahuan Pei
TL;DR
MELoRA introduces a parameter-efficient fine-tuning method that freezes pretrained weights and trains a group of mini LoRAs in parallel, forming a block-diagonal update that yields an equivalent rank of $n \times r$ while reducing trainable parameters by a factor of $n$. The approach guarantees a higher effective rank through diagonal concatenation, improving generalization with fewer parameters. Empirical results on GLUE (RoBERTa-base) and INSTRUCTEVAL (Llama-2-7B) show MELoRA outperforms LoRA and other baselines by notable margins, especially in low-data or instruction-following tasks, with up to 36x fewer trainable parameters. Analyses reveal the equivalent rank is a key driver of performance, with dataset-dependent optimal values for the number of mini LoRAs $n$ and mini-LoRA rank $r$, highlighting MELoRA’s flexibility and efficiency for large-scale PEFT.
Abstract
Parameter-efficient fine-tuning (PEFT) is a popular method for tailoring pre-trained large language models (LLMs), especially as the models' scale and the diversity of tasks increase. Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional, i.e., significant model changes can be represented with relatively few parameters. However, decreasing the rank encounters challenges with generalization errors for specific tasks when compared to full-parameter fine-tuning. We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank, thereby offering improved performance potential. The core idea is to freeze original pretrained weights and train a group of mini LoRAs with only a small number of parameters. This can capture a significant degree of diversity among mini LoRAs, thus promoting better generalization ability. We conduct a theoretical analysis and empirical studies on various NLP tasks. Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks, which demonstrates the effectiveness of MELoRA.
