Table of Contents
Fetching ...

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Chuanyu Tang, Yilong Chen, Zhenyu Zhang, Junyuan Shang, Wenyuan Zhang, Yong Huang, Tingwen Liu

TL;DR

It is hypothesized that low-rank LoRA already captures sufficient intrinsic information, and MoR can derive high-rank information through mathematical transformations of the low-rank components, and MoR can reduces the learning difficulty of LoRA and enhances its multi-task capabilities.

Abstract

Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning. However, significant challenges remain: (1) Simply increasing the rank size of LoRA does not effectively capture high-rank information, which leads to a performance bottleneck.(2) MoE-style LoRA methods substantially increase parameters and inference latency, contradicting the goals of efficient fine-tuning and ease of application. To address these challenges, we introduce Mixture of Ranks (MoR), which learns rank-specific information for different tasks based on input and efficiently integrates multi-rank information. We firstly propose a new framework that equates the integration of multiple LoRAs to expanding the rank of LoRA. Moreover, we hypothesize that low-rank LoRA already captures sufficient intrinsic information, and MoR can derive high-rank information through mathematical transformations of the low-rank components. Thus, MoR can reduces the learning difficulty of LoRA and enhances its multi-task capabilities. MoR achieves impressive results, with MoR delivering a 1.31\% performance improvement while using only 93.93\% of the parameters compared to baseline methods.

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

TL;DR

It is hypothesized that low-rank LoRA already captures sufficient intrinsic information, and MoR can derive high-rank information through mathematical transformations of the low-rank components, and MoR can reduces the learning difficulty of LoRA and enhances its multi-task capabilities.

Abstract

Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning. However, significant challenges remain: (1) Simply increasing the rank size of LoRA does not effectively capture high-rank information, which leads to a performance bottleneck.(2) MoE-style LoRA methods substantially increase parameters and inference latency, contradicting the goals of efficient fine-tuning and ease of application. To address these challenges, we introduce Mixture of Ranks (MoR), which learns rank-specific information for different tasks based on input and efficiently integrates multi-rank information. We firstly propose a new framework that equates the integration of multiple LoRAs to expanding the rank of LoRA. Moreover, we hypothesize that low-rank LoRA already captures sufficient intrinsic information, and MoR can derive high-rank information through mathematical transformations of the low-rank components. Thus, MoR can reduces the learning difficulty of LoRA and enhances its multi-task capabilities. MoR achieves impressive results, with MoR delivering a 1.31\% performance improvement while using only 93.93\% of the parameters compared to baseline methods.

Paper Structure

This paper contains 35 sections, 15 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Integrating multiple LoRAs in MoR can be regarded as increasing the rank of LoRA. MoR learns a shared parameter, which is then mapped to each subtask space through transformation. MoR focuses the learning goal on a small amount of effective information, greatly reducing the learning cost while increasing rank.
  • Figure 2: Overview of the MoR framework, this method takes place the raw FFN module of LLMs with the MoR plugin. The LoRA matrix $A$ and $B$ are shared, and the multi-dimensional scaling vector diagonal matrix $\Lambda$ is specialized with different domain information, mapping the shared LoRA matrices into specific subspace.
  • Figure 3: Vectors stack method to speed up the training and inference process with matrix parallel computing through GPUs.
  • Figure 4: Performance and parameter scale comparison between baselines and MoR. MoR E1R8 indicates one expert with a shared LoRA matrix rank of 8.
  • Figure 5: Varies of nums trainable parameters on different experts nums.
  • ...and 2 more figures