Table of Contents
Fetching ...

MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning

Dacao Zhang, Kun Zhang, Shimao Chu, Le Wu, Xin Li, Si Wei

TL;DR

MoRE addresses the rigidity of fixed LoRA ranks in multi-task PEFT by introducing a Mixture of Low-Rank Experts, where each rank acts as a task-aware expert selected by a gated mechanism informed by trainable task embeddings. The approach combines an adaptive rank selector with contrastive learning and balanced data sampling to capture task relationships while stabilizing multi-task training, achieving superior multi-task performance with minimal inference overhead. Extensive experiments on GLUE and related benchmarks demonstrate MoRE's effectiveness over LoRA and other MT baselines, including in few-shot transfer scenarios, and reveal insights into rank usage and task embedding structure. The work offers a practical, scalable pathway to efficient multi-task fine-tuning for large language models, with public code and models released for community use.

Abstract

With the rapid development of Large Language Models (LLMs), Parameter-Efficient Fine-Tuning (PEFT) methods have gained significant attention, which aims to achieve efficient fine-tuning of LLMs with fewer parameters. As a representative PEFT method, Low-Rank Adaptation (LoRA) introduces low-rank matrices to approximate the incremental tuning parameters and achieves impressive performance over multiple scenarios. After that, plenty of improvements have been proposed for further improvement. However, these methods either focus on single-task scenarios or separately train multiple LoRA modules for multi-task scenarios, limiting the efficiency and effectiveness of LoRA in multi-task scenarios. To better adapt to multi-task fine-tuning, in this paper, we propose a novel Mixture of Low-Rank Experts (MoRE) for multi-task PEFT. Specifically, instead of using an individual LoRA for each task, we align different ranks of LoRA module with different tasks, which we named low-rank experts. Moreover, we design a novel adaptive rank selector to select the appropriate expert for each task. By jointly training low-rank experts, MoRE can enhance the adaptability and efficiency of LoRA in multi-task scenarios. Finally, we conduct extensive experiments over multiple multi-task benchmarks along with different LLMs to verify model performance. Experimental results demonstrate that compared to traditional LoRA and its variants, MoRE significantly improves the performance of LLMs in multi-task scenarios and incurs no additional inference cost. We also release the model and code to facilitate the community.

MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning

TL;DR

MoRE addresses the rigidity of fixed LoRA ranks in multi-task PEFT by introducing a Mixture of Low-Rank Experts, where each rank acts as a task-aware expert selected by a gated mechanism informed by trainable task embeddings. The approach combines an adaptive rank selector with contrastive learning and balanced data sampling to capture task relationships while stabilizing multi-task training, achieving superior multi-task performance with minimal inference overhead. Extensive experiments on GLUE and related benchmarks demonstrate MoRE's effectiveness over LoRA and other MT baselines, including in few-shot transfer scenarios, and reveal insights into rank usage and task embedding structure. The work offers a practical, scalable pathway to efficient multi-task fine-tuning for large language models, with public code and models released for community use.

Abstract

With the rapid development of Large Language Models (LLMs), Parameter-Efficient Fine-Tuning (PEFT) methods have gained significant attention, which aims to achieve efficient fine-tuning of LLMs with fewer parameters. As a representative PEFT method, Low-Rank Adaptation (LoRA) introduces low-rank matrices to approximate the incremental tuning parameters and achieves impressive performance over multiple scenarios. After that, plenty of improvements have been proposed for further improvement. However, these methods either focus on single-task scenarios or separately train multiple LoRA modules for multi-task scenarios, limiting the efficiency and effectiveness of LoRA in multi-task scenarios. To better adapt to multi-task fine-tuning, in this paper, we propose a novel Mixture of Low-Rank Experts (MoRE) for multi-task PEFT. Specifically, instead of using an individual LoRA for each task, we align different ranks of LoRA module with different tasks, which we named low-rank experts. Moreover, we design a novel adaptive rank selector to select the appropriate expert for each task. By jointly training low-rank experts, MoRE can enhance the adaptability and efficiency of LoRA in multi-task scenarios. Finally, we conduct extensive experiments over multiple multi-task benchmarks along with different LLMs to verify model performance. Experimental results demonstrate that compared to traditional LoRA and its variants, MoRE significantly improves the performance of LLMs in multi-task scenarios and incurs no additional inference cost. We also release the model and code to facilitate the community.

Paper Structure

This paper contains 28 sections, 10 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: The overall framework of our proposed MoRE.
  • Figure 2: (a)-(b) The distribution of expert allocation. (c) Visualization of the task embeddings.
  • Figure 3: Relative Training Speed of Different Parameter-Efficient Fine-Tuning Methods.
  • Figure 4: Parameter Sensitivity Test on $\lambda$ in Eq.(\ref{['eq:total_loss']}) and hidden dimension of task embedding.
  • Figure 5: Visualization of Task Embeddings in Layer 1.
  • ...and 2 more figures