Table of Contents
Fetching ...

Unlocking the Global Synergies in Low-Rank Adapters

Zixi Zhang, Cheng Zhang, Xitong Gao, Robert D. Mullins, George A. Constantinides, Yiren Zhao

TL;DR

HeteroLoRA is presented, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance.

Abstract

Low-rank Adaption (LoRA) has been the de-facto parameter-efficient fine-tuning technique for large language models. We present HeteroLoRA, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance. In addition to the allocation for the standard LoRA-adapted models, we also demonstrate the efficacy of HeteroLoRA by performing the allocation in a more challenging search space that includes LoRA modules and LoRA-adapted shortcut connections. Experiments show that HeteroLoRA enables improvements in model performance given the same parameter budge. For example, on MRPC, we see an improvement of 1.6% in accuracy with similar training parameter budget. We will open-source our algorithm once the paper is accepted.

Unlocking the Global Synergies in Low-Rank Adapters

TL;DR

HeteroLoRA is presented, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance.

Abstract

Low-rank Adaption (LoRA) has been the de-facto parameter-efficient fine-tuning technique for large language models. We present HeteroLoRA, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance. In addition to the allocation for the standard LoRA-adapted models, we also demonstrate the efficacy of HeteroLoRA by performing the allocation in a more challenging search space that includes LoRA modules and LoRA-adapted shortcut connections. Experiments show that HeteroLoRA enables improvements in model performance given the same parameter budge. For example, on MRPC, we see an improvement of 1.6% in accuracy with similar training parameter budget. We will open-source our algorithm once the paper is accepted.
Paper Structure (27 sections, 12 equations, 6 figures, 10 tables)

This paper contains 27 sections, 12 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: An illustration of the HeteroLoRA search space in a Transformer model. Given a fixed number of trainable parameters, HeteroLoRA finds an efficient heterogeneous LoRA configuration for a model on a specific task. Each of the standard LoRA module and LoRA-adapted shortcut can be enabled or disabled.
  • Figure 2: Training pipeline of (a) static HeteroLoRA, where LoRA modules are enabled/disabled at the start of training, and (b) dynamic HeteroLoRA, where LoRA modules are enabled/disabled periodically, e.g., every $1/5$ epoch, during the training.
  • Figure 3: LoRA-adapted shortcut architecture on two Transformer layers with post-layer-normalisation. Blue blocks are the residual shortcuts $s_\text{res1}$ and $s_\text{res2}$. Green blocks are the $s_\text{in}$ same-layer style "cross-layer" shortcuts. Red blocks are the $s_\text{cut}$ "cut-layer" style cross-layer shortcuts.
  • Figure 4: Frequency of linear projections in every model layer being enabled in the dynamic HeteroLoRA trained on MRPC for LoRA and shortcut modules. A noticeable preference for value projections over query projections indicates that the value update generally contributes more to the fine-tuned performance.
  • Figure 5: Frequency of linear projections in every model layer being enabled in the dynamic HeteroLoRA training on RTE with (a) $r=8$ and (b) $r=32$ for LoRA and shortcut modules. For each rank value, combined LoRA allocation is compared to separated LoRA allocation.
  • ...and 1 more figures