Table of Contents
Fetching ...

LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks

Hanqing Wang, Bowen Ping, Shuo Wang, Xu Han, Yun Chen, Zhiyuan Liu, Maosong Sun

TL;DR

Experiments across six generative tasks demonstrate that the proposed LoRA-Flow method consistently outperforms baselines with task-level fusion weights, underscoring the necessity of introducing dynamic fusion weights for LoRA combination.

Abstract

LoRA employs lightweight modules to customize large language models (LLMs) for each downstream task or domain, where different learned additional modules represent diverse skills. Combining existing LoRAs to address new tasks can enhance the reusability of learned LoRAs, particularly beneficial for tasks with limited annotated data. Most prior works on LoRA combination primarily rely on task-level weights for each involved LoRA, making different examples and tokens share the same LoRA weights. However, in generative tasks, different tokens may necessitate diverse skills to manage. Taking the Chinese math task as an example, understanding the problem description may depend more on the Chinese LoRA, while the calculation part may rely more on the math LoRA. To this end, we propose LoRA-Flow, which utilizes dynamic weights to adjust the impact of different LoRAs. The weights at each step are determined by a fusion gate with extremely few parameters, which can be learned with only 200 training examples. Experiments across six generative tasks demonstrate that our method consistently outperforms baselines with task-level fusion weights. This underscores the necessity of introducing dynamic fusion weights for LoRA combination.

LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks

TL;DR

Experiments across six generative tasks demonstrate that the proposed LoRA-Flow method consistently outperforms baselines with task-level fusion weights, underscoring the necessity of introducing dynamic fusion weights for LoRA combination.

Abstract

LoRA employs lightweight modules to customize large language models (LLMs) for each downstream task or domain, where different learned additional modules represent diverse skills. Combining existing LoRAs to address new tasks can enhance the reusability of learned LoRAs, particularly beneficial for tasks with limited annotated data. Most prior works on LoRA combination primarily rely on task-level weights for each involved LoRA, making different examples and tokens share the same LoRA weights. However, in generative tasks, different tokens may necessitate diverse skills to manage. Taking the Chinese math task as an example, understanding the problem description may depend more on the Chinese LoRA, while the calculation part may rely more on the math LoRA. To this end, we propose LoRA-Flow, which utilizes dynamic weights to adjust the impact of different LoRAs. The weights at each step are determined by a fusion gate with extremely few parameters, which can be learned with only 200 training examples. Experiments across six generative tasks demonstrate that our method consistently outperforms baselines with task-level fusion weights. This underscores the necessity of introducing dynamic fusion weights for LoRA combination.
Paper Structure (31 sections, 8 equations, 7 figures, 4 tables)

This paper contains 31 sections, 8 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of the proposed LoRA-Flow method. For the token $y_t$ at the $t$-th step, we use a gate that conditions on the prefix $\mathbf{y}_{<t}$ to determine the fusion weights. The dynamic fusion weights are intended to control the influence of different LoRA modules, to better cope with various types of tokens in generative tasks. Red and blue rectangles represent the weights assigned to the two involved LoRAs.
  • Figure 2: Left: we use layer-wise fusion gates to facilitate dynamic LoRA fusion, which project input hidden states of each layer into fusion weights. Right: for a certain module, the provided fusion weights are used to aggregate the outputs of different LoRAs. Since our goal is to leverage the abilities acquired by existing LoRAs to address new tasks, we only train the fusion gate with a few examples, while keeping both the model and the LoRAs frozen. The number of parameters of the fusion gate is only approximately 0.2% of those in a LoRA.
  • Figure 3: Average fusion weights for the Zh Chat and En Math LoRAs across different layers.
  • Figure 4: Average fusion weights for the Zh Chat and En Math LoRAs at different time steps.
  • Figure 5: Detailed analysis for the fusion procedure of LoRA-Flow. The upper subgraph illustrates the fusion weights for each token, while the bottom subgraph details the content. From the fusion weights, we observe three segments where the fusion weights for the Zh Chat LoRA noticeably decrease while those for the En Math LoRA increase. We highlight the tokens corresponding to these segments using green, yellow, and red colors, respectively. Surprisingly, these three segments mainly contain numbers, which are closely related to mathematical reasoning ability. We also offer English translations of the input and output Chinese text in Figure \ref{['fig:case-study-translated']} in the Appendix.
  • ...and 2 more figures